loading and running benchmark...
My results in Chrome, Firefox, and old Edge (not the Chromium based):
lib \ FFT size | 128 | 256 | 512 | 1024 | 2048 | 4096 | 8192 | 16384 |
---|---|---|---|---|---|---|---|---|
fft.js 4.0.3 | 410000 82200 62900 | 174000 44400 29800 | 173000 20100 16700 | 99240 11200 7400 | 36300 5360 4440 | 19000 2770 1860 | 8090 1230 1080 | 3940 663 434 |
kissfft extra copy | 110966 142000 48734 | 103000 83100 42000 | 80800 61000 31100 | 42800 44400 20600 | 26000 22200 13200 | 11600 12200 6740 | 7140 6780 3990 | 3170 3210 1820 |
kissfft zero copy | 669000 651000 510588 | 422000 355000 255000 | 249000 195000 179000 | 98000 69700 69200 | 53000 44400 44400 | 22000 18800 16100 | 11700 10200 10300 | 4810 4360 3770 |
# of iterations | 7812 | 3906 | 1953 | 976 | 488 | 244 | 122 | 61 |
I compared fft.js 4.0.3 (which claims to be "The fastest JS Radix-4/Radix-2 FFT implementation"), and KISS FFT compiled into WASM with Emscripten (see the next section for details).
Benchmark result values somewhat vary from run to run, so take the figures with a grain of salt.
That's why I rounded them to three significant digits.
Orders of magnitudes are pretty consistent though, so you should get rough idea comparing them instead of comparing exact figures.
# of iterations is how many iterations are used to measure execution speed and average over.
fft.js has great performance in Chrome, the same order of magnitude as the best binding implementation of
KISS FFT compiled into WASM,
which is indeed impressive!
However in other browsers I tested, JS code performs noticeably worse, i.e. several times slower compared to the fastest approach.
So we can say that Chromium engineers did a great job squeezing a lot of performance out of execution environment,
as well as fft.js author tailoring the code to use the opportunities.
Yet, overall, across different browsers respectively the best execution speed can be achieved with KISS FFT compiled into WASM.
The downside of using KISS FFT is the size of compiled code:
WASM: 33 KB
supporting JS: 27 KB
compared to fft.js: 5 KB
Additionally larger WASM blob may result in longer compilation times compared to the tiny JS implementation.
I didn't measure it.
In my case I needed direct and inverse transformation for real-valued time data (i.e. direct real→complex, and inverse complex→real).
KISS FFT implements both.
fft.js implements only direct transformation for real-valued data, the inverse you'll have to implement yourself, or
use the general complex-valued variant, which does not take into account the
conjugate symmetry
of the complex frequency data and runs noticeably slower than the specialized variant.
Also KISS FFT can handle any size of data, implementing optimizations for factors 2, 3, 4, and 5,
while fft.js requires size to be a power of 2.
For my application with potentially thousands of direct and inverse transfromations
I think it is beneficial to prefer faster execution speed considering it potentially can save battery life.
I consider download overhead of ~55 KB and potentially longer compilation times on page loading an acceptable tradeoff
to gain execution speed several times faster compared to JS code ("only" ~50% better in Chrome).
First we need to get Emscripten SDK. The easiest way is to follow instructions on
this page
and use emsdk
tool.
Sources for KISS FFT can be found in the project's
GitHub repo.
To perform direct and inverse transformations for real-valued time data we need to compile
kiss_fft.c
, which implements general purpose FFT, and
tools\kiss_fftr.c
, which implements real-valued data stuff.
What's left is to implement bindings. (download the whole example)
To conveniently use it in JS code like this:
const fft = new Module.KissFft(size);
const result = fft.transform(data);
we have to implement bindings. Luckily it is super easy to do once you figure out how to use
Embind.
To be able to pass data in javascript Float64Array
s to C++ code and return result back
I came up with this implementation (see comments in code):
#include "tools/kiss_fftr.h"
#include <emscripten/bind.h>
#include <emscripten/val.h>
enum class FftDirection {
Direct,
Inverse
};
struct KissFftBase {
KissFftBase(size_t size, FftDirection direction) {
state = kiss_fftr_alloc(size, direction == FftDirection::Direct ? 0 : 1, nullptr, nullptr);
}
~KissFftBase() {
kiss_fft_free(state);
}
kiss_fftr_state *state;
};
emscripten::val toJSFloat64Array(const std::vector<double> &v) {
emscripten::val view{ emscripten::typed_memory_view(v.size(), v.data()) }; // make a view of local object
auto result = emscripten::val::global("Float64Array").new_(v.size()); // make a JS typed array
result.call<void>("set", view); // set typed array values "on the JS side" using the memory view
return result;
}
struct KissFftRealExtraCopy : KissFftBase {
KissFftRealExtraCopy(size_t size) : KissFftBase{ size, FftDirection::Direct } {}
auto transform(const emscripten::val &input) const {
const auto data = emscripten::convertJSArrayToNumberVector<double>(input); // copy data from argument into local memory
std::vector<double> output; // real and imaginary components are interleaved,
output.resize(data.size() + 2); // need +1 bin (+2 elements) for Nyquist frequency
kiss_fftr(state, data.data(), reinterpret_cast<kiss_fft_cpx*>(output.data())); // reinterpret_cast is needed to call the API
return toJSFloat64Array(output); // copy data to a JS Float64Array object and return the object
}
};
EMSCRIPTEN_BINDINGS(KissFft) {
emscripten::class_<KissFftRealExtraCopy>("KissFftRealExtraCopy") // expose class with ctor and the only method
.constructor<size_t>()
.function("transform", &KissFftRealExtraCopy::transform)
;
}
I put this code in kiss_fft.js.cpp
.
(This code is used for kissfft extra copy
variant in benchmark.)
At this point we can build and use it.
Execute
em++ -I. -Dkiss_fft_scalar=double kiss_fft.c tools\kiss_fftr.c kiss_fft.js.cpp -o build\kissfft.js --bind -O3 --closure 1
em++
is the name of the Emscripten compiler frontend executable. Alternatively you can use emcc
.
-Ipath
adds includes path. I execute it in the directory with the rest of KISS FFT sources,
so it's sufficient to just add the current directory with -I.
-Dkiss_fft_scalar=double
defines double
, i.e. 64 bit floating point type, as the real scalar type.
I tried single and double precision floating point types and their performance is roughly the same, so it's beneficial to use double precision.
--bind
tells the compiler to use Embind bindings.
-O3
tells the compiler to optimize the code for speed.
--closure 1
runs Closure compiler to optimize the size of supporting JS code
(from 62 KB to 27 KB in my case).
Though providing convenient interface, extra copying incurs noticeable penalty in all cases,
sometimes even making WASM implementation slower than JS code.
If we want ultimate speed, we need to eliminate extra copying at the JS-WASM boundary.
Fortunately we can do it very easily even without making the interface ugly beyond repair.
//...
struct KissFftReal : KissFftBase {
KissFftReal(size_t size) : KissFftBase{ size, FftDirection::Direct } {
input.resize(size); // initialize buffers with sizes on construction
output.resize(size + 2); // complex output needs +1 bin (+2 elements) for Nyquist frequency
}
auto transform() const {
kiss_fftr(state, input.data(), reinterpret_cast<kiss_fft_cpx*>(output.data()));
return emscripten::val{ emscripten::typed_memory_view(output.size(), output.data()) }; // return view of output data buffer
}
auto getInputTimeDataBuffer() {
return emscripten::val{ emscripten::typed_memory_view(input.size(), input.data()) }; // return view of input data buffer
}
private:
std::vector<double> input;
mutable std::vector<double> output;
};
struct KissFftRealInverse : KissFftBase {
KissFftRealInverse(size_t size) : KissFftBase{ size, FftDirection::Inverse } {
input.resize(size + 2); // complex input should contain +1 bin (+2 elements) for Nyquist frequency
output.resize(size);
}
auto transform() const {
kiss_fftri(state, reinterpret_cast<const kiss_fft_cpx*>(input.data()), output.data());
return emscripten::val{ emscripten::typed_memory_view(output.size(), output.data()) };
}
auto getInputFrequencyDataBuffer() {
return emscripten::val{ emscripten::typed_memory_view(input.size(), input.data()) };
}
private:
std::vector<double> input;
mutable std::vector<double> output;
};
EMSCRIPTEN_BINDINGS(KissFft) { // expose classes each with ctor and two methods
emscripten::class_<KissFftReal>("KissFftReal")
.constructor<size_t>()
.function("getInputTimeDataBuffer", &KissFftReal::getInputTimeDataBuffer)
.function("transform", &KissFftReal::transform)
;
emscripten::class_<KissFftRealInverse>("KissFftRealInverse")
.constructor<size_t>()
.function("getInputFrequencyDataBuffer", &KissFftRealInverse::getInputFrequencyDataBuffer)
.function("transform", &KissFftRealInverse::transform)
;
}
(This code is used for kissfft zero copy
variant in benchmark.)
Here we expose buffers allocated on the WASM side as typed memory views to the JS side, no additional allocations or copying is needed.
We can use it like this on JS side:
const fft = new Module.KissFftReal(size);
const input = fft.getInputTimeDataBuffer(); // get buffer
// fill input buffer
const output = fft.transform(); // perform transformation
// use transformation result returned in output buffer
Not terribly aweful if you ask me.
Considering that it offers better performance in comparison with JS implementation
(and that inverse transform for real-valued time data is already implemented)
I'm gonna use KISS FFT compiled into WASM.