Shannon's Hatchet

something, but almost nothing

Friday, February 24, 2023

TypeScript modules With Emscripten and CMake, part 2

When I set out to create an NPM package for SoundSwallower, I was unable to find much relevant information in the Emscripten documentation or elsewhere on the Web, so I have written this guide, which walks through the process of compiling a library to WebAssembly and packaging it as a CommonJS or ES6 module.

This is part of a series of posts. Start here to read from the beginning.

Exporting functions

In the previous episode, we successfully built a CommonJS module and accompanying WebAssembly file, which we loaded in Node with require, but which contains no useful code. There are two reasons for this:

The first order of business, then, is to get some functions exported. There are exactly four of these that we care about, and while we could put them on the emcc command-line or declare them as exported in the code, it is better in the long run to list them in a text file. So, create the file exported_functions.txt containing:

_kiss_fftr_alloc
_kiss_fftr
_malloc
_free

You’ll notice we had to add a leading underscore to the names, which is an Emscripten convention, for some reason. Now, if you run:

emcc -o kissfft.js jsbuild/libkissfft-float.a -sMODULARIZE=1 \
    -sEXPORTED_FUNCTIONS=@exported_functions.txt

You should see a kissfft.wasm of a more impressive size. More to the point, you can, actually, import the module and run these functions. Create the script test_kissfft.js containing:

const assert = require("assert");
require("./kissfft.js")().then((kissfft) => {
  let fftr = kissfft._kiss_fftr_alloc(16, 0, 0, 0);
  assert.ok(fftr);
  console.log(`fftr is ${fftr}`);
});

When you run it, it will produce some quite useless output resembling:

$ node test_kissfft.js
fftr is 70216

Um, hooray? This takes a bit of explaining, which will also explain why kissfft.js exists and why it’s so incredibly huge (don’t worry, we will make it smaller eventually).

In the beginning, Emscripten would compile all your code into JavaScript, and exported functions would be attributes on a global module object called, obviously, Module. This is not so great if you want to make a CommonJS or ES6 module instead of just mashing everything into the global namespace like it’s 1995. Also, if your code is quite large, you might want to do other things (like display a web page) while it loads and compiles.

So, when you pass -sMODULARIZE=1 to emcc, what you get when you call require on the generated JavaScript is not the module itself, but rather a function returning a Promise to return that module. This is necessary because currently you can’t directly import WebAssembly from JavaScript code but must use the asynchronous-only WebAssembly API. The various code in kissfft.js handles the loading of the WebAssembly and exporting its functions into a module object.

Although we use this Promise directly above, it is usually more convenient to await on it, like this:

const createModule = require("./kissfft.js");
const kissfft = await createModule();

Calling C functions directly

With all that explanation out of the way, let’s get down to the business of actually using the functions we exported above.

As noted previously these become properties of the object that we get when Promise returned by calling the function that require returns resolves (what a mouthful!). They otherwise work just like they do in C, with the obvious exception that all of their arguments are simple JavaScript numbers with no type-checking whatsoever.

What does this mean if you have strings, pointers, etc? Well, you can use the Emscripten utilities ccall and cwrap for simple cases, especially involving C strings. The rest of that page is not worth reading (WebIDL, inlining JavaScript in C… who does that?) with the exception of the section on directly accessing memory, which is super important, because if you have any kind of interesting data, that is what you will have to do, but also super wrong (as of writing this), because it misses one extremely important detail. Read on to find out which one!

As a reminder, we are doing all this because we have some time-domain data, which we would like to transform into frequency-domain data. Specifically, we are using the real-valued FFT function kiss_fftr, which reads from an array of nfft real values (in this case, float which is 32 bits by definition) and writes to an array of nfft / 2 + 1 complex numbers, which are represented with a struct that looks like this:

typedef struct {
    kiss_fft_scalar r; // (NOTE: these are floats)
    kiss_fft_scalar i;
} kiss_fft_cpx;

No, it’s not really documented how Emscripten organizes and aligns structs in memory, but we can be reasonably sure that they are packed, and we can assume an array of 9 kiss_fft_cpx is equivalent to an array of 18 float.

So let’s create these arrays! We’ll do this by creating a Float32Array on the JavaScript side then copying it into the module’s memory space (sadly there is no more efficient way to do this). To keep things manageable, we’ll do a 16-point FFT of a waveform with a single component at half the Nyquist frequency (yes, really). You can see the equivalent C code here.

const timedata = new Float32Array([0, 0.5, 0, -0.5,
                                   0, 0.5, 0, -0.5,
                                   0, 0.5, 0, -0.5,
                                   0, 0.5, 0, -0.5]);

Now we’ll figure the length of the input and the output and allocate them in the module’s address space (you may be asking, but why aren’t we checking the return value of _malloc? By default, Emscripten is configured to panic if malloc fails instead of returning NULL):

const nfft = timedata.length;
const nfreq = nfft / 2 + 1;
const ctimedata = kissfft._malloc(nfft * 4); // float
const cfreqdata = kissfft._malloc(nfreq * 4 * 2); // complex

And we will copy the data to the address we allocated using set on the module’s HEAP8 array, first taking a view of it as a Uint8Array. THIS IS SUPER IMPORTANT! as otherwise JavaScript will helpfully convert each of your floating-point values to an 8-bit integer, which is definitely not what you want:

kissfft.HEAP8.set(new Uint8Array(timedata.buffer), ctimedata);

Great! Now we can just call the function with the “pointers” (which are really just indices into HEAP8) we created above:

kissfft._kiss_fftr(fftr, ctimedata, cfreqdata);

To get the output array, we use slice to make a copy (confusingly, this is the oppposite of what a “slice” does in every other programming language in existence) of the memory and then take a view of it as a Float32Array:

const freqdata = new Float32Array(
      kissfft.HEAP8.slice(cfreqdata, cfreqdata + nfreq * 4 * 2).buffer);

And finally, we should deallocate everything so that we don’t run out of memory:

kissfft._free(ctimedata);
kissfft._free(cfreqdata);
kissfft._free(fftr);

You can download the full test script here.

Obviously, it’s not a great idea to have an API full of functions with arbitrary number parameters and no type-checking, though this is JavaScript, after all, so perhaps some people consider that to be totally acceptable. In the next installment, we will find out how to make a safer and more programmer-friendly API!

Friday, February 24, 2023

TypeScript modules With Emscripten and CMake, part 1

When I set out to create an NPM package for SoundSwallower, I was unable to find much relevant information in the Emscripten documentation or elsewhere on the Web, so I have written this guide, which will hopefully be useful to anyone doing similar things.

This is the first of a series of posts, which will be collected in a single table of contents when complete.

Should you read this guide?

If you have, or wish to write, a library written in C, or which exposes a C API, and which uses CMake as its build system, and you wish to package it:

  • as a CommonJS or ES6 module
  • for the Web and for Node.js
  • with type definitions for TypeScript
  • on npmjs.com

Then yes, you should read this guide!

If you have a library written in Rust, then you don’t need this guide, since you have much better documentation already.

If you have a library written in C++ which exposes a C++ API, you should really reconsider your life choices.

Prerequisites

I presume, here, that you know what WebAssembly is and roughly how it works, and that you have already installed:

I have only tested this stuff on Ubuntu 22.04 with Chrome and Firefox, if there is anything missing to make it work on reasonable development environments elsewhere, let me know. It ought to work with Windows Subsystem for Linux or MSYS2, and probably can work on MacOS as well.

Overview

As I mentioned, this guide comes from my experience with SoundSwallower, and if you want, you can just go and see how I did things there. Because it’s a fairly complicated library and API, I won’t use it for this guide, but rather something simpler, which even has “simple” in its name, specifically Kiss FFT, a library which implements the Fast Fourier Transform (FFT from here on).

Although, predictably, a complete WebAssembly package for Kiss FFT is already available on npm, the goal here is not to compete with that package, but to demonstrate the workings of Emscripten and the process of integrating a WebAssembly build into a CMake build system. If you just want to do an FFT in your application, I strongly recommend using that existing package!

We will go through the process of building a library, then we will test this library in an excessively simple web application, which records from your microphone and displays a spectrogram.

Why would you ever want to do this? Well, the Web Audio API, in its infinite wisdom, provides an FFT implementation that is useless for anything other than making an animated VU meter, and though there is a good and fast FFT implementation inside WebRTC, there is no way to access it, or any of the other useful WebRTC functionality like voice activity detection from JavaScript. So, if we want to do something actually useful, we have no choice but to reimplement an FFT, either in JavaScript (this works just fine, and is what wavesurfer.js does) or in some other language, which we will compile to WebAssembly (or JavaScript). As you can guess, we will do the latter, since that’s the subject of this guide!

Since we are keeping this simple, we will create a module which wraps exactly one API, namely the real-valued FFT in kiss_fftr.h. Nonetheless, this is a very typical C API, so it should be applicable to various other libraries.

Initial build

First, let’s make sure that we can actually build Kiss FFT with Emscripten. To check out the source code and configure it to build in a subdirectory called jsbuild:

git clone https://github.com/mborgerding/kissfft.git
cd kissfft
# This is expected to fail
emcmake cmake -S . -B jsbuild -DCMAKE_BUILD_TYPE=Debug

Oops! That didn’t work at all, because it depends on some other libraries that Emscripten doesn’t know about. We are not going to port these libraries here - instead, we’ll look in the README, which helpfully tells us how to disable those parts of the build, and also to build it as a static library only:

emcmake cmake -S . -B jsbuild -DCMAKE_BUILD_TYPE=Debug \
    -DKISSFFT_TOOLS=OFF -DKISSFFT_STATIC=ON -DKISSFFT_TEST=OFF

Much better! We can build the library, giving us a file called libkissfft-float.a which we don’t exactly know what to do with:

cmake --build jsbuild

This is not a WebAssembly file, nor is it a JavaScript file, but some kind of intermediate representation that Emscripten will turn into one or both of them. To produce a JavaScript/WebAssembly library, we have to “link” it into a JavaScript module. Ultimately we’ll set up CMake to do this for us, but just to try it out, you can run:

emcc -o kissfft.js jsbuild/libkissfft-float.a -sMODULARIZE=1

This will produce the files kissfft.js and kissfft.wasm in jsbuild. You should even be able to import this module in the Node REPL:

> require("./kissfft.js");

Hooray! You have built a module … or have you?

On further inspection you’ll notice that kissfft.wasm is suspiciously small. Indeed, if you “disassemble” it (actually just convert it to text format) using this handy online tool, you will see some Emscripten runtime functions… and definitely nothing Kiss FFT related. If you peruse the copious boilerplate in kisfft.js, same thing. What the h*ck?

What’s next

In the next post, we will learn how to export functions from C code so they are accessible to JavaScript code, how to pass blocks of data from JavaScript TypedArray objects to C functions, and how to handle returned blocks of data.

Friday, February 10, 2023

Il n'y a rien là!

What am I working on lately?

Mostly ReadAlongs Web (or see the demo). Things I’ve learned along the way are:

  • Dependency management in the JavaScript world is, as they say, a real hell, and somehow even worse than in the Python world, which seems to have made great strides in cleaning up its act with regard to packaging.
  • Package lockfiles do not really help. Perhaps there will be a full-fledged “opinion piece” on this later.
  • Webpack, for all the complaining it attracts, is a mature and well-maintained piece of software that really tries hard to do the right thing, and we should be kind to it.
  • Too many so-called “frameworks” in this so-called “ecosystem”. Most JavaScript-related project webpages are 80% hype, 10% rows and rows of icons for other projects, and 10% buzzwords.
  • Angular, for all the complaining it attracts, is fairly good, if only because it does not use JSX/TSX templating, which is deeply distasteful, or any other weird magic, with the exception of sigh dependency injection, which is thoroughly unnecessary. Explicit is always better than implicit (says the Python programmer).
  • There are few problems solved by ReactiveX that can’t be solved async, await, and generators, and those ones are probably not worth solving.

This is all in preparation for our very exciting workshop which will be presented (virtually) at ICLDC!

Aside from that … stay tuned for various WFST-related things in your browser. More to come.