Your result:

loading and running benchmark...

(Result values are in real-valued time data FFT transforms per second, higher is better. # of iterations is how many iteratios are used to measure execution speed.)

My results in Chrome, Firefox, and old Edge (not the Chromium based):

lib \ FFT size	128	256	512	1024	2048	4096	8192	16384
fft.js 4.0.3	410000 82200 62900	174000 44400 29800	173000 20100 16700	99240 11200 7400	36300 5360 4440	19000 2770 1860	8090 1230 1080	3940 663 434
kissfft extra copy	110966 142000 48734	103000 83100 42000	80800 61000 31100	42800 44400 20600	26000 22200 13200	11600 12200 6740	7140 6780 3990	3170 3210 1820
kissfft zero copy	669000 651000 510588	422000 355000 255000	249000 195000 179000	98000 69700 69200	53000 44400 44400	22000 18800 16100	11700 10200 10300	4810 4360 3770
# of iterations	7812	3906	1953	976	488	244	122	61

See below about extra copy and zero copy bindings for KISS FFT.

Performance overview

I compared fft.js 4.0.3 (which claims to be "The fastest JS Radix-4/Radix-2 FFT implementation"), and KISS FFT compiled into WASM with Emscripten (see the next section for details).

Benchmark result values somewhat vary from run to run, so take the figures with a grain of salt. That's why I rounded them to three significant digits. Orders of magnitudes are pretty consistent though, so you should get rough idea comparing them instead of comparing exact figures.
# of iterations is how many iterations are used to measure execution speed and average over.

fft.js has great performance in Chrome, the same order of magnitude as the best binding implementation of KISS FFT compiled into WASM, which is indeed impressive!
However in other browsers I tested, JS code performs noticeably worse, i.e. several times slower compared to the fastest approach. So we can say that Chromium engineers did a great job squeezing a lot of performance out of execution environment, as well as fft.js author tailoring the code to use the opportunities.

Yet, overall, across different browsers respectively the best execution speed can be achieved with KISS FFT compiled into WASM.

The downside of using KISS FFT is the size of compiled code:
WASM: 33 KB supporting JS: 27 KB
compared to fft.js: 5 KB
Additionally larger WASM blob may result in longer compilation times compared to the tiny JS implementation. I didn't measure it.

In my case I needed direct and inverse transformation for real-valued time data (i.e. direct real→complex, and inverse complex→real).
KISS FFT implements both. fft.js implements only direct transformation for real-valued data, the inverse you'll have to implement yourself, or use the general complex-valued variant, which does not take into account the conjugate symmetry of the complex frequency data and runs noticeably slower than the specialized variant.
Also KISS FFT can handle any size of data, implementing optimizations for factors 2, 3, 4, and 5, while fft.js requires size to be a power of 2.

Conclusion

For my application with potentially thousands of direct and inverse transfromations I think it is beneficial to prefer faster execution speed considering it potentially can save battery life.
I consider download overhead of ~55 KB and potentially longer compilation times on page loading an acceptable tradeoff to gain execution speed several times faster compared to JS code ("only" ~50% better in Chrome).

Compiling KISS FFT into WASM with JS bindings

First we need to get Emscripten SDK. The easiest way is to follow instructions on this page and use emsdk tool.

Sources for KISS FFT can be found in the project's GitHub repo.
To perform direct and inverse transformations for real-valued time data we need to compile kiss_fft.c, which implements general purpose FFT, and tools\kiss_fftr.c, which implements real-valued data stuff.

What's left is to implement bindings. (download the whole example)

Binding with extra copying

To conveniently use it in JS code like this:

const fft = new Module.KissFft(size);
const result = fft.transform(data);

we have to implement bindings. Luckily it is super easy to do once you figure out how to use Embind.
To be able to pass data in javascript Float64Arrays to C++ code and return result back I came up with this implementation (see comments in code):

#include "tools/kiss_fftr.h"

#include <emscripten/bind.h>
#include <emscripten/val.h>

enum class FftDirection {
  Direct,
  Inverse
};

struct KissFftBase {
  KissFftBase(size_t size, FftDirection direction) {
    state = kiss_fftr_alloc(size, direction == FftDirection::Direct ? 0 : 1, nullptr, nullptr);
  }
  ~KissFftBase() {
    kiss_fft_free(state);
  }

  kiss_fftr_state *state;
};

emscripten::val toJSFloat64Array(const std::vector<double> &v) {
  emscripten::val view{ emscripten::typed_memory_view(v.size(), v.data()) };  // make a view of local object

  auto result = emscripten::val::global("Float64Array").new_(v.size());  // make a JS typed array
  result.call<void>("set", view);  // set typed array values "on the JS side" using the memory view

  return result;
}

struct KissFftRealExtraCopy : KissFftBase {
  KissFftRealExtraCopy(size_t size) : KissFftBase{ size, FftDirection::Direct } {}

  auto transform(const emscripten::val &input) const {
    const auto data = emscripten::convertJSArrayToNumberVector<double>(input);  // copy data from argument into local memory

    std::vector<double> output;      // real and imaginary components are interleaved,
    output.resize(data.size() + 2);  // need +1 bin (+2 elements) for Nyquist frequency

    kiss_fftr(state, data.data(), reinterpret_cast<kiss_fft_cpx*>(output.data()));  // reinterpret_cast is needed to call the API

    return toJSFloat64Array(output);  // copy data to a JS Float64Array object and return the object
  }
};

EMSCRIPTEN_BINDINGS(KissFft) {
  emscripten::class_<KissFftRealExtraCopy>("KissFftRealExtraCopy")  // expose class with ctor and the only method
    .constructor<size_t>()
    .function("transform", &KissFftRealExtraCopy::transform)
    ;
}

I put this code in kiss_fft.js.cpp. (This code is used for kissfft extra copy variant in benchmark.)

At this point we can build and use it.

Building

Execute

em++ -I. -Dkiss_fft_scalar=double kiss_fft.c tools\kiss_fftr.c kiss_fft.js.cpp -o build\kissfft.js --bind -O3 --closure 1

em++ is the name of the Emscripten compiler frontend executable. Alternatively you can use emcc.

-Ipath adds includes path. I execute it in the directory with the rest of KISS FFT sources, so it's sufficient to just add the current directory with -I.

-Dkiss_fft_scalar=double defines double, i.e. 64 bit floating point type, as the real scalar type. I tried single and double precision floating point types and their performance is roughly the same, so it's beneficial to use double precision.

--bind tells the compiler to use Embind bindings.

-O3 tells the compiler to optimize the code for speed.

--closure 1 runs Closure compiler to optimize the size of supporting JS code (from 62 KB to 27 KB in my case).

Though providing convenient interface, extra copying incurs noticeable penalty in all cases, sometimes even making WASM implementation slower than JS code.
If we want ultimate speed, we need to eliminate extra copying at the JS-WASM boundary. Fortunately we can do it very easily even without making the interface ugly beyond repair.

Binding with zero copying

//...
struct KissFftReal : KissFftBase {
  KissFftReal(size_t size) : KissFftBase{ size, FftDirection::Direct } {
    input.resize(size);       // initialize buffers with sizes on construction
    output.resize(size + 2);  // complex output needs +1 bin (+2 elements) for Nyquist frequency
  }

  auto transform() const {
    kiss_fftr(state, input.data(), reinterpret_cast<kiss_fft_cpx*>(output.data()));

    return emscripten::val{ emscripten::typed_memory_view(output.size(), output.data()) };  // return view of output data buffer
  }

  auto getInputTimeDataBuffer() {
    return emscripten::val{ emscripten::typed_memory_view(input.size(), input.data()) };  // return view of input data buffer
  }

private:
  std::vector<double> input;
  mutable std::vector<double> output;
};

struct KissFftRealInverse : KissFftBase {
  KissFftRealInverse(size_t size) : KissFftBase{ size, FftDirection::Inverse } {
    input.resize(size + 2);  // complex input should contain +1 bin (+2 elements) for Nyquist frequency
    output.resize(size);
  }

  auto transform() const {
    kiss_fftri(state, reinterpret_cast<const kiss_fft_cpx*>(input.data()), output.data());

    return emscripten::val{ emscripten::typed_memory_view(output.size(), output.data()) };
  }

  auto getInputFrequencyDataBuffer() {
    return emscripten::val{ emscripten::typed_memory_view(input.size(), input.data()) };
  }

private:
  std::vector<double> input;
  mutable std::vector<double> output;
};

EMSCRIPTEN_BINDINGS(KissFft) {  // expose classes each with ctor and two methods
  emscripten::class_<KissFftReal>("KissFftReal")
    .constructor<size_t>()
    .function("getInputTimeDataBuffer", &KissFftReal::getInputTimeDataBuffer)
    .function("transform", &KissFftReal::transform)
    ;

  emscripten::class_<KissFftRealInverse>("KissFftRealInverse")
    .constructor<size_t>()
    .function("getInputFrequencyDataBuffer", &KissFftRealInverse::getInputFrequencyDataBuffer)
    .function("transform", &KissFftRealInverse::transform)
    ;
}

(This code is used for kissfft zero copy variant in benchmark.)

Here we expose buffers allocated on the WASM side as typed memory views to the JS side, no additional allocations or copying is needed.
We can use it like this on JS side:

const fft = new Module.KissFftReal(size);

const input = fft.getInputTimeDataBuffer();  // get buffer
// fill input buffer
const output = fft.transform();  // perform transformation
// use transformation result returned in output buffer

Not terribly aweful if you ask me.
Considering that it offers better performance in comparison with JS implementation (and that inverse transform for real-valued time data is already implemented) I'm gonna use KISS FFT compiled into WASM.