Let’s Write a Web Assembly Interpreter (Part 1)

Richard Anaya
5 min readMar 25, 2020

--

Hey everyone. I’m a bit obsessed with web assembly, so I thought i’d tried to write some semi-higher quality information about what I’ve learned. Some months ago I just started reading the web assembly spec and poking around at bytes. It all started as just a fundamental question “what’s even in here?” What makes the magic of all this “faster than JavaScript” stuff work? It ended up being pretty fascinating, and several mad science projects later, I found myself recently with a desire to make a web assembly interpreter. Let’s learn as we go!

One of the neat things about web assembly is that the structure of everything inside is fairly human comprehensible. We need a way of representing:

  • functions
  • details about our memory
  • globals
  • blobs of data (like constant strings of characters like “hello world!”)
  • names of things we’d like exported

And each one of these major components have their own little section at the binary level. Each section in order has

  • a type ID number that says what type the section is
  • how much data is in the section (so you can hop to the next section after)

The first job of our interpreter is really going to just be able take those sections and turn them into enough representation for us to evaluate.

For this first part of a web assembly interpreter, we’re just going to be focused on taking a web assembly program “simple.wasm” that exposes a single function “main” that returns a number.

#[no_mangle]
pub fn main(_args:usize,_len:usize) -> usize {
return 42;
}

To make things more interesting we will be writing our web assembly interpreter in web assembly. We’ll be using the Rust programming language.

How to get our program bytes into our interpreter

Let’s first ask ourselves how we are even going our web assembly program “simple.wasm” into our interpreter. Since all browser applications start in JavaScript, we must look at how JavaScript loads web assembly.

async function loadAndRun() {
let interpreter = await loadwebAssembly("interpreter.wasm");
let simpleProgramBytes = await loadBytes("simple.wasm");
let result = interpreter.run(simpleProgramBytes);
window.alert("the result is "+result);
}

This looks like a fairly straightforward plan of attack. Let’s implement a few of these functions:

First a way of getting bytes from url

async function loadBytes(url) {
var response = await fetch(url);
var data = await response.arrayBuffer();
return data;
}

Now lets load a web assembly module

async function loadWebAssembly(url) {
let bytes = await loadBytes(url);
let program = await WebAssembly.instantiate(bytes, {})
return program.instance.exports;
}

However, we run into a problem if we try to run:

interpreter.run(simpleProgramBytes);

Web assembly cannot take in arraybuffers as a parameter meaningfully.

Functions exposed in web assembly can only take in numbers

It’s about this point that most people studying web assembly go “Wait … what? How am I supposed to get anything inside my web assembly program?”

One option we can imagine goes like the following:

  1. Call a function on the module to allocate some space in memory for the data I want to put in and return a pointer (index to a location) in memory of where that block begins.
  2. Manually copy in the bytes of the data structure using JavaScript. This data structure will have to be something the web assembly program can destruct.
  3. Call a function passing in the pointer to the data structure

For our code that would look something like:

async function loadAndRun() {
// lets get things loaded
let interpreter = await loadwebAssembly("interpreter.wasm");
let simpleProgramBytes = await loadBytes("simple.wasm");
// create views of our data as bytes
let bytesToCopy = new Uint8Array(simpleProgramBytes);
let memory = new Uint8Array(interpreter.memory.buffer);
// allocate space in our interpreter for the program
let ptr = interpreter.malloc(bytesToCopy.length);

// reaquire handle to memory because malloc changed it
memory = new Uint8Array(interpreter.memory.buffer);
// copy the bytes of our program into interpreter memory
memory.set(bytesToCopy, ptr);
// signal our interpreter to run given the location
// and length of our web assembly program we copied over
let result = interpreter.run(ptr,bytesToCopy.length);
window.alert("the result is "+result);
}

Great, we now have a way of loading in “simple.wasm” program’s bytes into an interpreter web assembly program! We see from our JavaScript that we need to write some functions inside our interpreter:

#[no_mangle]
fn malloc(size:usize) -> *mut u8{
...
}
#[no_mangle]
fn run(ptr:usize, len:usize) -> f64 {
...
}

Writing an interpreter

Now to the real business! Let’s first just write a basic malloc

#[no_mangle]
fn malloc(size:usize) -> *mut u8 {
let mut buf = Vec::<u8>::with_capacity(size as usize);
let ptr = buf.as_mut_ptr();
core::mem::forget(buf);
ptr
}

Basically we create vector of bytes, get a pointer to its data section, then forget about the vector so it doesn’t get de-allocated. Our task will be to re-create the vector when we go to interpret the app after JavaScript has loaded in the bytes ofsimple.wasm.

#[no_mangle]
fn run(ptr:usize, len:usize) -> f64 {
let wasm_bytes = unsafe {
Vec::from_raw_parts(ptr, len, len)
};
// Magic happens here!
let result = ...

// return the result of our interpreted execution of "run"
result
}

How cool is that! Now we have nothing in our way. Stay tuned for part 2 where we will dive into parsing out the sections of our very small web assembly program, and dissecting the parts relevant only to calling main. Check out the code here.

[Update] Part 2 has been released here.

--

--