Batch Streaming MODWT (Extensions)
This guide shows how to stream-process batches of signals using the SIMD-optimized SoA kernels under the hood, while keeping a user-friendly Array-of-Arrays (AoS) API.
Where SIMD lives
Batch streaming SIMD support is part of the optional
vectorwave-extensionsmodule (Java 25 + incubator). The core module provides scalar streaming; enable extensions and run with--add-modules jdk.incubator.vector --enable-previewto use SIMD paths.
When to use
- You process many signals in fixed-size blocks (e.g., real-time audio, sensor ticks in windows).
- You want predictable per-block latency and high throughput on mid/large blocks.
- You accept per-block boundary semantics (see below), or you use PERIODIC mode.
API Overview
- Single level (AoS I/O):
BatchMODWT.singleLevelAoS(wavelet, double[][] signals)->SingleLevelResult { double[][] approx, detail }
- Multi level (AoS I/O):
BatchMODWT.multiLevelAoS(wavelet, double[][] signals, int levels)->MultiLevelResult { double[][][] detailPerLevel, double[][] finalApprox }
- Streaming facade (per-block):
BatchStreamingMODWT.Builder().wavelet(...).boundary(PERIODIC|ZERO_PADDING|SYMMETRIC).levels(L).build()processSingleLevel(double[][] block)orprocessMultiLevel(double[][] block)
All arrays are AoS: [batch][length] for inputs/outputs and [levels][batch][length] for per-level details.
Padding Strategies
- PERIODIC: Uses SIMD SoA kernels (
BatchSIMDMODWT) for maximum throughput; no per-instance state is required across blocks. Each block is treated as circular. - ZERO_PADDING and SYMMETRIC: Now backed by SIMD streaming with a left-history (ring buffer) per level. The facade maintains per-level SoA history across calls and applies the same upsampled, 1/√2-scaled analysis filters as core. Parity matches whole-signal core transforms when concatenating block outputs in order.
- First block initialization: history is zero-filled (ZERO_PADDING) or synthesized via symmetric reflection from the current block (SYMMETRIC).
- Subsequent blocks: history is updated from the previous block’s tail; no right-context is required for MODWT’s (t − l) convolution.
Usage
try (var streaming = new BatchStreamingMODWT.Builder()
.wavelet(com.morphiqlabs.wavelet.api.Daubechies.DB4)
.boundary(com.morphiqlabs.wavelet.api.PaddingStrategies.PERIODIC) // or ZERO_PADDING/SYMMETRIC (SIMD streaming)
.levels(3)
.build()) {
for (double[][] block : blocks) { // AoS [batch][blockSize]
var out = streaming.processMultiLevel(block);
double[][][] dpl = out.detailPerLevel();
double[][] approx = out.finalApprox();
// ... consume outputs ...
}
// Optional: emit a synthetic tail at end-of-stream for non-PERIODIC
// ZERO_PADDING/SYMMETRIC only. Use accessors to choose a valid tail length:
int minTail = streaming.getMinFlushTailLength();
var tail = streaming.flushMultiLevel(minTail);
}
Performance Notes
- SIMD speedups are realized in all modes here; PERIODIC has the least overhead, while ZERO_PADDING/SYMMETRIC add small per-level ring-buffer maintenance.
- For best SIMD utilization, choose batch sizes that are multiples of the platform vector length:
- Query
MODWTBatchSIMD.getVectorLength()orMODWTBatchSIMD.getBatchSIMDInfo().
- Query
- Use larger block sizes when possible to amortize per-call overhead.
Correctness and Parity
- Multi-level AoS batch functions are parity-aligned with core
MultiLevelMODWTTransform(periodic), using identical scaling and à trous upsampling. - Unit tests validate:
- Single-level and multi-level streaming parity for ZERO_PADDING and SYMMETRIC vs core whole-signal transforms (concatenated blocks).
- Single-level parity (Haar, DB4) vs core for AoS batches.
- Multi-level per-level parity and final approximation vs core.
Inverse APIs
- Single-level inverse (AoS):
BatchMODWT.inverseSingleLevelAoS(wavelet, approxAoS, detailAoS)->reconstructedAoS
- Multi-level inverse (AoS):
BatchMODWT.inverseMultiLevelAoS(wavelet, detailPerLevelAoS, finalApproxAoS)->reconstructedAoS
These use core inverse transforms internally for correctness.
Tips
- If you need cross-block continuity for non-PERIODIC modes, consider overlap-and-add at the application layer or use larger windows.
- For peak throughput with many levels, pre-size arrays and reuse builders.
- For end-of-stream padding, use
flushSingleLevel(tailLen)orflushMultiLevel(tailLen)to get a final tail under ZERO_PADDING/SYMMETRIC. - Use
getMinFlushTailLength()to query the largest universally valid tail length across levels, orgetHistoryLengthForLevel(level)for the per-level maximum in single-level modes.