SIMD Integration (Vector API)
This page consolidates everything you need to know to use VectorWave with SIMD acceleration.
Where SIMD Lives
- Core (
vectorwave-core): Portable scalar Java 25 implementation. - Extensions (
vectorwave-extensions): Optional Java 25 module that provides SIMD/Vector API acceleration and high‑throughput facades (batch AoS, batch streaming).
Core works everywhere; add the extensions module when you want hardware SIMD acceleration.
Requirements
- JDK: Java 25 (or compatible) for the extensions module (uses incubator Vector API).
- JVM flags at compile/run time:
--add-modules jdk.incubator.vector--enable-preview
- Recommended: GraalVM 25.0.1 for +10–20% throughput.
Enabling SIMD
- Add the extensions dependency
<dependency>
<groupId>com.morphiqlabs</groupId>
<artifactId>vectorwave-extensions</artifactId>
<version>1.0.0-SNAPSHOT</version>
</dependency>
- Pass Vector API flags when running
java --add-modules jdk.incubator.vector --enable-preview -jar yourapp.jar
With Maven Surefire/Exec you can supply:
<argLine>
--add-modules=jdk.incubator.vector
--enable-preview
</argLine>
What You Get (Extensions)
- Batch MODWT (AoS facade, SoA under the hood):
com.morphiqlabs.wavelet.modwt.BatchMODWT - Batch Streaming MODWT (AoS facade):
com.morphiqlabs.wavelet.modwt.BatchStreamingMODWT - SIMD kernels and utilities:
com.morphiqlabs.wavelet.internal.VectorOps, batch SIMD inBatchSIMDMODWT - Optimizations: reduced temporary allocations, SoA layouts, FMA in hot loops, thread‑local scratch buffers
Quick Usage Examples
Batch (AoS facade):
double[][] signals = ...; // [batch][length]
int levels = 3;
var out = com.morphiqlabs.wavelet.modwt.BatchMODWT
.multiLevelAoS(com.morphiqlabs.wavelet.api.Daubechies.DB4, signals, levels);
double[][][] dpl = out.detailPerLevel();
double[][] approx = out.finalApprox();
Batch Streaming (AoS facade):
try (var streaming = new com.morphiqlabs.wavelet.modwt.BatchStreamingMODWT.Builder()
.wavelet(com.morphiqlabs.wavelet.api.Daubechies.DB4)
.boundary(com.morphiqlabs.wavelet.api.PaddingStrategies.PERIODIC)
.levels(3)
.build()) {
var block = nextBlock(); // [batch][blockLen]
var res = streaming.processMultiLevel(block);
}
Detecting SIMD at Runtime
The simplest check is attempting to access the preferred species:
boolean simdAvailable;
try {
simdAvailable = jdk.incubator.vector.DoubleVector.SPECIES_PREFERRED.length() > 1;
} catch (Throwable t) {
simdAvailable = false;
}
``;
Core API also exposes a performance hint (always scalar in core):
```java
boolean vectorization = com.morphiqlabs.wavelet.WaveletOperations
.getPerformanceInfo().vectorizationEnabled();
Benchmarks and Validation
- JMH microbenchmarks (allocation/FMA):
mvn -q -pl vectorwave-examples -am exec:java \
-Dexec.mainClass=org.openjdk.jmh.Main \
-Dexec.args="com.morphiqlabs.benchmark.VectorOpsAllocationBenchmark -prof gc -f 1 -wi 5 -i 10"
Expect near‑zero allocations/op for vector paths and higher throughput than scalar.
Related Documentation
-
Guides
- Batch Processing: docs/guides/BATCH_PROCESSING.md
- Batch Streaming (Extensions): docs/guides/BATCH_STREAMING.md
- Streaming: docs/guides/STREAMING.md
- Performance Tuning: docs/guides/PERFORMANCE_TUNING.md
- Developer Guide: docs/guides/DEVELOPER_GUIDE.md
-
Performance
- Performance Guide: docs/PERFORMANCE.md
- Optimizations: docs/performance/OPTIMIZATIONS.md
-
Technical Specs
- Vector API Compilation: docs/technical-specs/VECTOR_API_COMPILATION.md
- Vector API Fallback: docs/technical-specs/VECTOR_API_FALLBACK.md
- FFT Algorithm Notes: docs/technical-specs/FFT_ALGORITHM_ANALYSIS.md
-
Examples
- Examples README: vectorwave-examples/README.md
- BatchStreamingFlushExample: vectorwave-examples/src/main/java/com/morphiqlabs/examples/BatchStreamingFlushExample.java
- VectorOpsAllocationBenchmark: vectorwave-benchmarks/src/main/java/com/morphiqlabs/benchmark/VectorOpsAllocationBenchmark.java
Notes and Caveats
- Vector API is incubator; interfaces may change across JDKs.
- Always pass
--enable-previewand module flags in dev/test/prod when using extensions. - Core and extensions maintain strict parity of results; SIMD affects speed, not correctness.