RLBox

Overview

RLBox is a toolkit for sandboxing third party C libraries, that are being used by C++ code (support for other languages is in the works). RLBox was originally developed for Firefox1, which has been shipping with it since 2020.

The RLBox toolkit consists of:

  1. A C++ framework (RLBox) that makes it easy to retrofit existing application code to safely interface with sandboxed libraries.

  2. A Wasm backend (based on wasm2c) for isolating (sandboxing) C libraries.

In this section, we provide an overview of the RLbox framework, its reason for being, and a high level sketch of how it works. In the next section, we will provide a tutorial that provides an end-to-end example of applying RLbox to a simple application.

Why RLBox

Work on RLbox began several years ago while attempting to add fine grain isolation third party libraries in the Firefox renderer. Initially we attempted this process without any support from a framwork like RLBox, instead attempting to manually deal with all the details sandboxing such as sanitizing untrusted inputs, and reconciling ABI differences between the sandbox and host application.

This went poorly, it was tedious, error prone, and did nothing to abstract the details of the underlying sandbox from the developer. We had basically no hope that this would result in code that was maintainable, or that normal Mozilla developers who were unfamiliar with the gory details of our system would be able to sandbox new library, let alone maintain existing ones.

So we scrapped this manual approach and build RLBox1.

RLbox automates many of the low level details of sandboxing and allows you, as a security engineer or application developer, to instead focus just on what you need to do to sandbox your particular application.

To sandbox a library — and thus to move to a world where the library is no longer trusted — we need to modify this application-library boundary. For example, we need to add security checks in Firefox to ensure that any value from the sandboxed library is properly validated before it is used. Otherwise, the library (when compromised) may be able to abuse Firefox code to hijack its control flow 1. The RLBox API is explicitly designed to make retrofitting of existing application code simpler and less error-prone.2

What does RLBox provide?

RLBox ensures that a sandboxed library is memory isolated from the rest of the application — the library cannot directly access memory outside its designated region — and that all boundary crossings are explicit. This ensures that the library cannot, for example, corrupt Firefox's address space. It also ensures that Firefox cannot inadvertently expose sensitive data to the library. The figure below illustrates this idea.

RLBox explicitly isolates the library data and control flow from the application

Memory isolation is enforced by the underlying sandboxing mechanism (e.g., using Wasm3) from the start, when you create the sandbox with create_sandbox(). Explicit boundary crossings are enforced by RLBox (either at compile- or and run-time). For example, with RLBox you can't call library functions directly; instead, you must use the invoke_sandbox_function() method. Similarly, the library cannot call arbitrary Firefox functions; instead, it can only call functions that you expose with the register_callback() method. (To simplify the sandboxing task, though, RLBox does expose a standard library as described in the Standard Library.)

When calling a library function, RLBox copies simple values into the sandbox memory before calling the function. For larger data types, such as structs and arrays, you can't simply pass a pointer to the object. This would leak ASLR and, more importantly, would not work: sandboxed code cannot access application memory. So, you must explicitly allocate memory in the sandbox via malloc_in_sandbox() and copy application data to this region of memory (e.g., via strlcpy).

RLBox similarly copies simple return values and callback arguments. Larger data structures, however, must (again) be passed by sandbox-reference, i.e., via a reference/pointer to sandbox memory.

To ensure that application code doesn't unsafely use values that originate in the sandbox -- and may thus be under the control of an attacker -- RLBox considers all such values as untrusted and taints them. Tainted values are essentially opaque values (though RLBox does provide some basic operators on tainted values). To use a tainted value, you must unwrap it by (typically) copying the value into application memory -- and thus out of the reach of the attacker -- and verifying it. Indeed, RLBox forces application code to perform the copy and verification in sync using verification functions (see this).

References

Setting up your RLBox environment.

RLBox current spans two repositories. One that contains just the RLBox C++ framework which you can grab with:

git clone [email protected]:PLSysSec/rlbox.git

The other, which contains our modified version of wasm2c and related backend tools, for converting your C library to an isolated and sandboxed version can be grabbed with:

git clone https://github.com/PLSysSec/rlbox_wasm2c_sandbox

This repo contains our modified version of wasm2c, a wasm runtime (and very limited wasi runtime), and pulls down a copy of the wasi-sdk and the rlbox framework as part of its build process, providing a single location for all our tools, which is handy for example purposes and getting started.

Many folks perfer to do a system wide or per-user install of the wasi-sdk and RLBox. The latest releases of the wasi-sdk (which will given you everything you need to compile your library from C to wasm) be found here.

Quick Install

To quickly install the RLBox repo, you can run the following:

git clone [email protected]:PLSysSec/rlbox.git
cd rlbox
cmake -S . -B ./build -DCMAKE_BUILD_TYPE=Release
cmake --build ./build --config Release --parallel
cd build && sudo make install

To quickly install the rlbox tools repo, which includes everything you will need for the tutorial you can run the following:

git clone https://github.com/PLSysSec/rlbox_wasm2c_sandbox
cd rlbox_wasm2c_sandbox
cmake -S . -B ./build
cmake --build ./build --target all

More detailed setup instructions can be found in the README of each repo.

The RLBox Tutorial

In this tutorial we will walk you through the steps of adding sandboxing to a very simple application and library. However, all the basic step generalize to more complex examples.

We have broken the tutorial into two parts, in the first part, we will look at how to use RLBox to retrofit sandboxing in an existing application, taking all the steps to that control flow and data flow across the application library boundary are secure.

In the second part, we will look at how to re-build our library with wasm and link this into our applciation, so our library is isolated.

Once we complete these two steps, our library is now securely isolated from our application

If you would like to download and run the examples yourself, follow the instructions here.

Retrofitting isolation in a simple application

For our tutorial, we're going to be sandboxing a tiny library mylib. While this library is very simple, it exercises key features of RLBox including: calling functions, copying strings into the sandbox, registering and handling callbacks from the library.

This first part of the tutorial is going to focus on modify our application to add sandboxing, the next part will focus on recompiling our library with wasm to enforce isolation.

In this example, we're going to use the noop sandbox backend. The noop sandbox does not actually enforce isolation, it is simply a tool that makes it easier to port new libraries to RLBox. The noop sandbox does nothing more than turn our calls into the RLBox sandbox into normal function calls to the library we already have linked in our application.

The reason for this noop backend is that it supports incrementally porting our application. Instead of having to worry about trying to change all our library interfaces at once (to account for ABI differences between a sandbox and our normal library), and deal with the resulting head-aches. We can change gradually change our function calls from normal library calls, to sandbox calls, and at each step test that our application continues to work as expected.

Our example library

mylib declares four functions in mylib.h:

#pragma once

#ifdef __cplusplus
extern "C" {
#endif
    void hello();
    unsigned add(unsigned, unsigned);
    void echo(const char* str);
    void call_cb(void (*cb) (const char* str));
#ifdef __cplusplus
}
#endif

And implements those function in mylib.c:

#include <stdio.h>
#include "mylib.h"

void hello() {
  printf("Hello from mylib\n");
}

unsigned add(unsigned a, unsigned b) {
  return a + b;
}

void echo(const char* str) {
  printf("echo: %s\n", str);
}

void call_cb(void (*cb) (const char* str)) {
  cb("hi again!");
}

Boilerplate

To get started, in our main application (main.cpp) let's first import the RLBox library and implement some necessary boilerplate:

// We're going to use RLBox in a single-threaded environment.
#define RLBOX_SINGLE_THREADED_INVOCATIONS
// All calls into the sandbox are resolved statically.
#define RLBOX_USE_STATIC_CALLS() rlbox_noop_sandbox_lookup_symbol

#include <stdio.h>
#include <cassert>
#include <rlbox/rlbox.hpp>
#include <rlbox/rlbox_noop_sandbox.hpp>

#include "mylib.h"

using namespace std;
using namespace rlbox;

// Define base type for mylib using the noop sandbox
RLBOX_DEFINE_BASE_TYPES_FOR(mylib, noop);

// Declare callback function we're going to call from sandboxed code.
void hello_cb(rlbox_sandbox_mylib& _, tainted_mylib<const char*> str);

int main(int argc, char const *argv[]) {
  // ... will fill in shortly ...
  // destroy sandbox
  sandbox.destroy_sandbox();

  return 0;
}

Why the boilerplate? RLBox has support for different kinds of sandboxing back-ends. In practice we start with the noop sandbox, which is not a real sandbox, to get our types right and only at the end change from noop to a real sandbox like Wasm. This, alas, means the RLBox types are typically generic in the sandbox type (e.g., rlbox::tainted<T, sandbox_type>); macros like RLBOX_DEFINE_BASE_TYPES_FOR define simpler types for us (e.g., we can use tainted_mylib<T>). In this simple example we only use the noop sandbox; we walk through how you modify this code to use Wasm in the next chaper.

Creating sandboxes and calling sandboxed functions

Now that the boilerplate is out of the way, let's create a new sandbox and call the hello function:

  // Declare and create a new sandbox
  rlbox_sandbox_mylib sandbox;
  sandbox.create_sandbox();

  // Call the library hello function:
  sandbox.invoke_sandbox_function(hello);

We do not call hello() directly. Instead, we use the invoke_sandbox_function() method. Once we turn on sandboxing, i.e., switch from the noop sandbox to Wasm, we won't be able to call the function directly either (e.g., because Wasm's ABI might be different from the app).

Calling sandboxed functions and verifying their return value

Let's now call the add function:

  // call the add function and check the result:
  auto val = sandbox.invoke_sandbox_function(add, 3, 4);
  printf("Adding... 3+4 = %d\n", val);

  auto ok = sandbox.invoke_sandbox_function(add, 3, 4)
                   .copy_and_verify([](unsigned ret){
    printf("Adding... 3+4 = %d\n", ret);
    return ret == 7;
  });
  printf("OK? = %d\n", ok);

This call is a bit more interesting. First, we call add with arguments. Since these arguments are primitive types RLBox doesn't impose any restrictions. Second, RLBox ensures that the unsigned return value that add returns is tainted and thus cannot be used without verification. For example, Here, we call the copy_and_verify() method which copies the value into application memory and runs our verifier function:

[](unsigned ret){
      printf("Adding... 3+4 = %d\n", ret);
      return ret == 7;
}

This function (lambda) simply prints the tainted value and returns true if it is 7. A compromised library could return any value and if we use this value to, say, index an array this could potentially introduce an out-of-bounds memory access.

Calling functions with (tainted) strings

Let's now call the echo function which takes a slightly more interesting argument: a string. Here, we can't simply pass a string literal as an argument: the sandbox cannot access application memory where this would be allocated. Instead, we must allocate a buffer in sandbox memory and copy the string we want to pass to echo into this region:

  // Call the library echo function
  const char* helloStr = "hi hi!";
  size_t helloSize = strlen(helloStr) + 1;
  tainted_mylib<char*> taintedStr = sandbox.malloc_in_sandbox<char>(helloSize);
  strncpy(taintedStr
            .unverified_safe_pointer_because(helloSize, "writing to region")
         , helloStr, helloSize);

Here taintedStr is a tainted string: it lives in the sandbox memory and could be written to by the (compromised) library code concurrently. In general, it's unsafe for us to use tainted data without verification since it could be attacker controlled. In this particular case, though, we just want to copy data (helloStr specifically) to taintedStr. We do this by using the unverified_safe_pointer_because to essentially cast taintedStr to a char* the without any verification. This is safe because we are just copying helloStr to sandbox memory: at worst, the sandboxed library can overwrite the memory region pointed to by taintedStr and crash when it tries to print it.1

Note: Internally, unverified_safe_pointer_because is not actual just a cast. It also ensures (1) that the the pointer is within the sandbox and that (2) accessing helloSize bytes off the pointer would stay within the sandbox boundary.

It's worth mentionig that the string "writing to region" does not have any special meaning in the code. Rather the RLBox API asks you to provide a free-form string that acts as documentation. Essentially you are providing a string that says it is safe to remove the tainting from this type because... . Such documentation may be useful to other developers who read your code. In the above example, a write to the sandbox region cannot cause a memory safety error in the application so it's safe to remove the taint.

Now, we can just call the function and free the allocated string:

  sandbox.invoke_sandbox_function(echo, taintedStr);
  sandbox.free_in_sandbox(taintedStr);

Sneak peak of upcoming feature: In an upcoming version of RLBox transferring a buffer into the sandbox will much simpler with a new TransferBuffer abstraction. To get a sneak preview of this, take a look at the usage in Firefox.

Registering and handling callbacks

Finally, let's call the call_cb function. To do this, let's first define a callback for the function to call. We declared our callback in the boilerplate, but never defined the function. So let's do that at the end of the file:

void hello_cb(rlbox_sandbox_mylib& _, tainted_mylib<const char*> str) {
  auto checked_string =
    str.copy_and_verify_string([](unique_ptr<char[]> val) {
        assert(val != nullptr && strlen(val.get()) < 1024);
        return move(val);
    });
  printf("hello_cb: %s\n", checked_string.get());
}

This callback is called with a tainted string. To actually use the tainted string we need to verify it. To do this, we use the string verification function copy_and_verify_string() with a simple verifier:

    str.copy_and_verify_string([](unique_ptr<char[]> val) {
        assert(val != nullptr && strlen(val.get()) < 1024);
        return move(val);
    });

This verifier moves the string is not null and if it's length is less than 1KB. In the callback we simply print this string.

Let's now continue back in main. To call_cb with the callback with first need o register the callback -- otherwise RLBox will disallow the library-application call -- and pass the callback to the call_cb function:

  // register callback and call it
  auto cb = sandbox.register_callback(hello_cb);
  sandbox.invoke_sandbox_function(call_cb, cb);

Build and run

If you haven't installed RLBox, see the Install chapter.

Clone this books' repository:

git clone [email protected]:PLSysSec/rlbox-book.git
cd rlbox-book/src/chapters/examples/noop-hello-example

Build:

cmake -S . -B ./build -DCMAKE_BUILD_TYPE=Release
cmake --build ./build --config Release --parallel

Run:

$ ./build/main
Hello from mylib
Adding... 3+4 = 7
Adding... 3+4 = 7
OK? = 1
echo: hi hi!
hello_cb: hi again!
1

For single threaded applications the attacker can't overwrite the pointer because we're not calling into the sandbox before calling `strncpy.

Adding isolation with the Wasm sandbox backend

The noop backend makes it easy to add security checks. However, it does not enforce isolation. To finish sandboxing your library, we will need to.

  1. Update the application main.cpp to use the wasm2c sandbox backend instead of noop.

  2. Compile our library e.g. mylib.c to wasm i.e. mylib.wasm -- adding isolation to your library.

  3. Compile that resulting mylib.wasm file to C (mylib.wasm.c and mylib.wasm.h) with the wasm2c compiler -- allow it to be compiled and linked with our application.

  4. Compile and link the sandboxed library and our application.

We will look at each these steps next.

Switching to the wasm2c backend

Show below is a diff of main.cpp in our first example (using the noop sandbox backend) and in our current example (using the wasm2c backend).

1   $diff ../noop-hello-example/main.cpp main.cpp
2   3a4
3   >
4   5c6
5   < #define RLBOX_USE_STATIC_CALLS() rlbox_noop_sandbox_lookup_symbol
6   ---
7   > #define RLBOX_USE_STATIC_CALLS() rlbox_wasm2c_sandbox_lookup_symbol
8   9,11d9
9   < #include <rlbox/rlbox.hpp>
10  < #include <rlbox/rlbox_noop_sandbox.hpp>
11  <
12  12a11,14
13  > #include "mylib.wasm.h"
14  > #include "rlbox.hpp"
15  > #include "rlbox_wasm2c_sandbox.hpp"
16  >
17  18c20
18  < RLBOX_DEFINE_BASE_TYPES_FOR(mylib, noop);
19  ---
20  > RLBOX_DEFINE_BASE_TYPES_FOR(mylib, wasm2c);

As you can see, most of what has changed is renaming a few key instances of noop to wasm2c, most notably to change our backend type on line 10.

The only other changes are #include "mylib.wasm.h" on line 13, which brings in the new header for our sandboxed library generated by wasm2c.

These are essentially all the changes you will need to make to your application to switch wasm backends.

The changes to how rlbox is included on lines 14 and 15 are just an artifact of differences how our examples are built.

Our example Makefile

Doing all of these steps a command at a time would be terribly tedious. Instead, we automate all these steps with a simple make file. Lets take a look at our full Makefile, then walk through each part.

RLBOX_ROOT=../rlbox_wasm2c_sandbox

#RLBOX headers
RLBOX_INCLUDE=$(RLBOX_ROOT)/build/_deps/rlbox-src/code/include


#Our Wasi-SDK
WASI_SDK_ROOT=$(RLBOX_ROOT)/build/_deps/wasiclang-src/

#location of our wasi/wasm runtime
WASM2C_RUNTIME_PATH=$(RLBOX_ROOT)/build/_deps/mod_wasm2c-src/wasm2c
WASI_RUNTIME_FILES=$(addprefix $(WASM2C_RUNTIME_PATH), /wasm-rt-impl.c /wasm-rt-os-win.c  /wasm-rt-os-unix.c  /wasm-rt-wasi.c)

WASI_CLANG=$(WASI_SDK_ROOT)/bin/clang
WASI_SYSROOT=$(WASI_SDK_ROOT)/share/wasi-sysroot
WASM2C=$(RLBOX_ROOT)/build/_deps/mod_wasm2c-src/bin/wasm2c

#CFLAGS for compiling files output by wasm2co
WASM_CFLAGS=-Wl,--export-all -Wl,--no-entry -Wl,--growable-table -Wl,--stack-first -Wl,-z,stack-size=1048576

all: mylib.wasm mylib.wasm.c myapp

clean:
	rm -rf mylib.wasm mylib.wasm.c mylib.wasm.h myapp *.o

#Step 1: build our library into wasm, using clang from the wasi-sdk
mylib.wasm: mylib.c
	$(WASI_CLANG) --sysroot $(WASI_SYSROOT) $(WASM_CFLAGS) dummy_main.c mylib.c -o mylib.wasm

#Step 2: use wasm2c to convert our wasm to a C implementation of wasm we can link with our app.
mylib.wasm.c: mylib.wasm
	$(WASM2C) mylib.wasm -o mylib.wasm.c

#Step 3: compiling and linking our application with our library
myapp: mylib.wasm.c
	$(CC) -c $(WASI_RUNTIME_FILES) -I$(RLBOX_INCLUDE) -I$(RLBOX_ROOT)/include -I$(WASM2C_RUNTIME_PATH) mylib.wasm.c
	$(CXX) main.cpp -o myapp -I$(RLBOX_INCLUDE) -I$(RLBOX_ROOT)/include -I$(WASM2C_RUNTIME_PATH) *.o

Definitions

To start we can see our Makefile begins with RLBOX_ROOT:

RLBOX_ROOT=../rlbox_wasm2c_sandbox

Which just specificies where our rlbox_wasm2c_sandbox repo's root directory lives. This repo contains all the tools we will need build our sandboxed library e.g. wasm2c, our wasi-sdk (which CMake downloads), RLbox, etc.

Step 1: Compiling our library to Wasm

mylib.wasm: mylib.c
	$(WASI_CLANG) --sysroot $(WASI_SYSROOT) $(WASM_CFLAGS) dummy_main.c mylib.c -o mylib.wasm

Here we are building our library to wasm. Typically you will just want to update your build system to use the wasi-sdk clang (or wasi-clang) as your compiler. wasi-clang will link wasi-libc (a custom version of musl) with your library instead of the system libc. The wasi-libc library and headers live in $WASI_SYSROOT

Also note worthy are the $(WASM_CFLAGS) which are important to ensure that our output plays nicely with the rest of the toolchain.

Notice the dummy_main.c file to keep wasi-clang happy, you can find a copy rlbox_wasm2c_sandbox/_src/wasm2c_sandbox_wrapper.c

wasi-libc at the moment has a variety of limitations such as lack of pthread support (though this should be fixed soon!). Anything platform specific such as OS specific system calls (or just system calls that Wasi doesn't support), or platform specific code e.g. inline assembly will also fail at this stage.

Step 2: Using wasm2c to generate our sandboxed library

mylib.wasm.c: mylib.wasm
	$(WASM2C) mylib.wasm -o mylib.wasm.c

Here we use our fork of wasm2 to generates a mylib.wasm.c C file which implements and can be linked with an application.

The wasi runtime that ships with wasm2c at present implements only a subset of the Wasi API and denies all access to the file system and network.

Note: While RLBox currently only works with our fork of wasm2c we hope to upstream our changes to wasm2c in the near future.

Step 3: Compiling and linking our application with our library

myapp: mylib.wasm.c
	$(CC) -c $(WASI_RUNTIME_FILES) -I$(RLBOX_INCLUDE) -I$(RLBOX_ROOT)/include -I$(WASM2C_RUNTIME_PATH) mylib.wasm.c
	$(CXX) main.cpp -o myapp -I$(RLBOX_INCLUDE) -I$(RLBOX_ROOT)/include -I$(WASM2C_RUNTIME_PATH) *.o

Setting up our tutorial environment

To run the example in our two parttutorial, , create an example directory where our example app and tools will live, and enter that directory:

mkdir example
cd example

Next, clone the repo for this book, which contains our example code, and copy it to its own directory, and build the example code.

git clone [email protected]:PLSysSec/rlbox-book.git
mkdir myapp
cp -r rlbox-book/src/chapters/examples/noop-hello-example/* myapp
cd myapp
cmake -S . -B ./build -DCMAKE_BUILD_TYPE=Release
cmake --build ./build --config Release --parallel

Then run it to make sure everything is working.

./build/main

You should see the following output:

Hello from mylib
Adding... 3+4 = 7
Adding... 3+4 = 7
OK? = 1
echo: hi hi!
hello_cb: hi again!

Finally, return to the example directory, and clone and build our wasm toolchain, which includes fork of wasm2c, the wasi-sdk (everything you need to compile your C to wasm).

cd ..
git clone https://github.com/PLSysSec/rlbox_wasm2c_sandbox
cd rlbox_wasm2c_sandbox
cmake -S . -B ./build
cmake --build ./build --target all
cd ..

Additional material

  • The best example of how to user RLBox is to see its use in Firefox. The Firefox code search is a great way to do this.

  • Working through the simple library example repo is a good way to get a feel for retrofitting a simple application that uses a potentially buggy library is a good next. The solution is available in the solution folder in the same repo.

  • Documentation of the core RLBox APIs.

  • Short tutorial on using the RLBox APIs. Note that this tutorial uses the old Lucet Wasm compiler.

  • The RLBox test suite itself has a number of examples.

  • Finally, the original academic paper explaning the RLBox and its use in Firefox RLBoxPaper at the USENIX Security 2020 and the accompanying video explanations are a good way to get an overview of RLBox.