Overview

RLBox is a toolkit for sandboxing third party C libraries, that are being used by C++ code (support for other languages is in the works). RLBox was originally developed for Firefox¹, which has been shipping with it since 2020.

The RLBox toolkit consists of:

A C++ framework (RLBox) that makes it easy to retrofit existing application code to safely interface with sandboxed libraries.
An RLBox plugin that allows the use of wasm2c compiler for isolating (sandboxing) C libraries with Wasm.

In this section, we provide an overview of the RLBox framework, its reason for being, and a high level sketch of how it works. In the next section, we will provide a tutorial that provides an end-to-end example of applying RLBox to a simple application.

Why RLBox

Work on RLBox began several years ago while attempting to add fine grain isolation to third party libraries in the Firefox renderer. Initially we attempted this process without any support from a framework like RLBox, instead attempting to manually deal with all the details of sandboxing such as sanitizing untrusted inputs, and reconciling ABI differences between the sandbox and host application.

This went poorly; it was tedious, error prone, and did nothing to abstract the details of the underlying sandbox from the developer. We had basically no hope that this would result in code that was maintainable, or that normal Mozilla developers who were unfamiliar with the gory details of our system would be able to sandbox a new library, let alone maintain existing ones.

So we scrapped this manual approach and built RLBox¹.

RLBox automates many of the low level details of sandboxing and allows you, as a security engineer or application developer, to instead focus just on what you need to do to sandbox your particular application.

To sandbox a library — and thus to move to a world where the library is no longer trusted — we need to modify this application-library boundary. For example, we need to add security checks in Firefox to ensure that any value from the sandboxed library is properly validated before it is used. Otherwise, the library (when compromised) may be able to abuse Firefox code to hijack its control flow ¹. The RLBox API is explicitly designed to make retrofitting of existing application code simpler and less error-prone.²

What does RLBox provide?

RLBox ensures that a sandboxed library is memory isolated from the rest of the application — the library cannot directly access memory outside its designated region — and that all boundary crossings are explicit. This ensures that the library cannot, for example, corrupt Firefox's address space. It also ensures that Firefox cannot inadvertently expose sensitive data to the library. The figure below illustrates this idea.

RLBox explicitly isolates the library data and control flow from the application

Memory isolation is enforced by the underlying sandboxing mechanism (e.g., using Wasm³) from the start, when you create the sandbox with create_sandbox(). Explicit boundary crossings are enforced by RLBox (either at compile- or and run-time). For example, with RLBox you can't call library functions directly; instead, you must use the invoke_sandbox_function() method. Similarly, the library cannot call arbitrary Firefox functions; instead, it can only call functions that you expose with the register_callback() method. (To simplify the sandboxing task, though, RLBox does expose a standard library as described in the Standard Library.)

When calling a library function, RLBox copies simple values into the sandbox memory before calling the function. For larger data types, such as structs and arrays, you can't simply pass a pointer to the object. This would leak ASLR and, more importantly, would not work: sandboxed code cannot access application memory. So, you must explicitly allocate memory in the sandbox via malloc_in_sandbox() and copy application data to this region of memory (e.g., via strlcpy).

RLBox similarly copies simple return values and callback arguments. Larger data structures, however, must (again) be passed by sandbox-reference, i.e., via a reference/pointer to sandbox memory.

To ensure that application code doesn't unsafely use values that originate in the sandbox - and may thus be under the control of an attacker - RLBox considers all such values as untrusted and taints them. Tainted values are essentially opaque values (though RLBox does provide some basic operators on tainted values). To use a tainted value, you must unwrap it by (typically) copying the value into application memory - and thus out of the reach of the attacker - and verifying it. Indeed, RLBox forces application code to perform the copy and verification in sync using verification functions (see this).

References

Retrofitting Fine Grain Isolation in the Firefox Renderer by S. Narayan, et al.

The Road to Less Trusted Code: Lowering the Barrier to In-Process Sandboxing by T. Garfinkel et al.

WebAssembly and Back Again: Fine-Grained Sandboxing in Firefox 95 by B. Holley

The RLBox Tutorial

In this tutorial we will walk you through the steps of adding sandboxing to a very simple application and library. However, all the basic step generalize to more complex examples.

We will start by describing the simple application that uses a library, and then describe how to sandbox this in two parts.

In the first part, we will look at how to use RLBox to retrofit sandboxing in an existing application, taking all the steps to ensure that control flow and data flow across the application library boundary are secure.
In the second part, we will look at how to re-build our library with wasm and link this into our application, so our library is isolated.

Once we complete these two steps, our library is now securely isolated from our application.

Downloading and running the examples

To get the source code for the examples in the tutorial, download the repo as shown below:

git clone https://github.com/PLSysSec/rlbox-book

The chapters going forward will give commands on how to build and run these examples.

The examples in this tutorial will be self-contained and will pull those repos as needed. However, for reference, RLBox currently spans two repositories. One that contains just the RLBox C++ framework and the other, which contains the RLBox plugin for Wasm files compiled with the wasm2c compiler (which converts your C library to an isolated and sandboxed version). The two repos are available here:

The core RLBox library is available at https://github.com/PLSysSec/rlbox
The Wasm2c RLBox plugin is available at https://github.com/PLSysSec/rlbox_wasm2c_sandbox

The example in this tutorial

For our tutorial, we're going to be sandboxing a small application that uses a library called mylib.

Our example library

mylib declares four functions in mylib.h:

#pragma once

#ifdef __cplusplus
extern "C" {
#endif

void hello();
unsigned int add(unsigned int, unsigned int);
void echo(const char* str);
void call_cb(void (*cb) (const char* str));

#ifdef __cplusplus
}
#endif

And implements those function in mylib.c:

#include <stdio.h>
#include "mylib.h"

void hello() {
  printf("Hello from mylib\n");
}

unsigned int add(unsigned int a, unsigned int b) {
  return a + b;
}

void echo(const char* str) {
  printf("echo: %s\n", str);
}

void call_cb(void (*cb) (const char* str)) {
  cb("hi again!");
}

While this library is very simple, it will allow us to exercise key features of RLBox including: calling functions, copying strings into the sandbox, registering and handling callbacks from the library in the next chapters.

Our example application

The main application in main.cpp simply invokes each of these functions in turn.

#include <stdio.h>
#include <stdlib.h>

#define release_assert(cond, msg) if (!(cond)) { fputs(msg "\n", stderr); abort(); }

#include "mylib.h"

using namespace std;

// Declare callback function that's going to be invoked from the library.
void hello_cb(const char* str);

int main(int argc, char const *argv[]) {
  // Call the library hello function
  hello();

  int array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
  // Call the library add function
  auto ret = add(3, 4);
  auto array_val = array[ret];
  printf("Got array value %d\n", ret);

  // Call the library echo function
  const char* helloStr = "hi hi!";
  echo(helloStr);

  // Call the library function call_cb, passing in the callback hello_cb
  call_cb(hello_cb);

  return 0;
}

void hello_cb(const char* str)
{
  release_assert(str != nullptr, "Expected value for string");
  printf("hello_cb: %s\n", str);
}

To build this example on your machine, run the following commands

cd rlbox-book/src/examples/hello-example
cmake -S . -B ./build -DCMAKE_BUILD_TYPE=Debug
cmake --build ./build --config Debug --parallel

Then run it to make sure everything is working.

./build/main

You should see the following output:

Hello from mylib
Got array value 7
echo: hi hi!
hello_cb: hi again!

Retrofitting isolation in our example

This first part of the tutorial is going to focus on modify our application to add sandboxing, the next part will focus on recompiling our library with wasm to enforce isolation.

In this example, we're going to use the noop sandbox backend. The noop sandbox does not actually enforce isolation, it is simply a tool that makes it easier to port new libraries to RLBox. The noop sandbox does nothing more than turn our calls into the RLBox sandbox into normal function calls to the library we already have linked in our application.

The reason for this noop backend is that it supports incrementally porting our application. Instead of having to worry about trying to change all our library interfaces at once (to account for ABI differences between a sandbox and our normal library), and deal with the resulting head-aches. We can change gradually change our function calls from normal library calls, to sandbox calls, and at each step test that our application continues to work as expected.

Creating a noop sandbox

To get started, in our main application (main.cpp) let's first import the RLBox library and add some necessary boilerplate in the top of the file:

// We're going to use RLBox in a single-threaded environment.
#define RLBOX_SINGLE_THREADED_INVOCATIONS
// The fixed configuration line we need to use for the noop sandbox.
// It specifies that all calls into the sandbox are resolved statically.
#define RLBOX_USE_STATIC_CALLS() rlbox_noop_sandbox_lookup_symbol

#include "rlbox.hpp"
#include "rlbox_noop_sandbox.hpp"

using namespace rlbox;

// Define base type for mylib using the noop sandbox
RLBOX_DEFINE_BASE_TYPES_FOR(mylib, noop);

Why the boilerplate?

RLBox has support for different kinds of sandboxing back-ends/plugins, thus we need to specify which backend we will use and their configurations. While there is a lot of effort to avoid boilerplate, there is certain amount that either cannot be removed, or would be too costly to remove, which are the bits you are seeing here.

What does this boilerplate do?

Let's briefly go through the boilerplate to understand their specific purpose.

#define RLBOX_SINGLE_THREADED_INVOCATIONS tells RLBox that only one thread in our host application will invoke functions in the sandboxed library at a given time (There can be multiple application threads that all invoke functions, but the host application must ensure that only one thread executes functions in the sandboxed library. This can be done with a per-sandbox lock.). This macro allows RLBox to elide several internal mutex calls that greatly speeds up its performance. If you want to support multiple threads calling into the same sandbox, you can avoid this macro.

#define RLBOX_USE_STATIC_CALLS() rlbox_noop_sandbox_lookup_symbol tells RLBox that the noop sandbox makes static function calls into the library. For technical reasons, not all sandboxes support direction function calls. So, some sandbox backends may rely on indirect function calls via pointers into the sandboxed library. Unfortunately, this is something RLBox must know up front, so we have specify this. For all practical purposes, just think of this line as something you should specify as is. Each sandbox backend/plugin will have a version of this line in its documentation that you can copy as is.

We have #included two headers, rlbox.hpp which is the base rlbox library, and rlbox_noop_sandbox for the noop sandbox backend.

RLBOX_DEFINE_BASE_TYPES_FOR defines tainted types that are specific for each library. In our example, this macro now gives us tainted_mylib<T>, which will automatically map to rlbox::tainted<T, rlbox_noop_sandbox>. If we change the sandbox plugin/backend in the future, the mapping will change automatically.

Creating and destroying sandboxes

Now that the boilerplate is out of the way, let's create a new sandbox instance in the top of the main function that we will use in this application.

  // Declare and create a new sandbox
  rlbox_sandbox_mylib sandbox;
  sandbox.create_sandbox();

and destroy the sandbox at the end of main

  // destroy sandbox
  sandbox.destroy_sandbox();

Note: We can create multiple sandbox instances if we wanted. You can think of each sandbox instance as an isolated instance of the library. Each instance cannot interfere with another instance.

To see where this could be useful, consider securing a webserver that parses XML data from each incoming connection. You could sandbox the XML parsing library and spin up a single sandbox. This would ensure the server doesn't get compromised due to an XML parsing bug, however it won't prevent one malicious connection from interfering with the parsed XML contents of a different connection. However, you could spin up a new sandboxed XML-parser library instance for each incoming connection. This architecture would guarantee that a bug while processing an XML parameter in one of the connections will not spill over to processing of other connections.

Sandboxing function calls

We now move on to sandboxing the function calls made by the application to mylib. We can see that the application calls the hello function in the library.

  hello();

To sandbox this call, this is as simple as changing the syntax to:

  sandbox.invoke_sandbox_function(hello);

We have changed our code to not call hello() directly. Instead, we use RLBox's invoke_sandbox_function() method. This allows RLBox to mediate the function calls into the sandbox.

Calling sandboxed functions and verifying their return value

Let's now sandbox the call to the add function. We can see the our application calls the add function as shown below.

  int array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
  // Call the library add function
  auto ret = add(3, 4);
  auto array_val = array[ret];

To change, the function call, we will use invoke_sandbox_function as before.

  auto ret = sandbox.invoke_sandbox_function(add, 3, 4)

There are now a couple of interesting things happening.

First, add has arguments. These arguments are primitive types, so RLBox doesn't impose any restrictions and you can pass them to the sandboxed function as is (complex arguments have some restrictions as we will see later).
Second, if you check the type of ret, you'll see that RLBox ensures that the return value from add is tainted and thus cannot be used without verification. Concretely the type of ret is now tainted_mylib<unsigned int>. Thus if you try to compile this program, you will get a compilation error stating that a value of type ``tainted_mylib` cannot be used as an array index.

TO convert ret back to an unsigned int, we will have to verify it by calling the copy_and_verify() API. This API copies the value into application memory and runs a verifier function we will have to specify. The verifier should ensure ret does not contain a value that is unexpected. For now, let's add just the call to copy_and_verify() so we can unwrap ret from a tainted_mylib<unsigned int> to an unsigned int without worrying about the verifier.

  auto ret = sandbox.invoke_sandbox_function(add, 3, 4)
                   .copy_and_verify([](unsigned val){
    // .. to be specified ..
    return val;
  });

In the next chapter, we will discuss what we can put in the verifier to ensure the safety of ret.

Untainting values

Continuing our example, we need to figure out what values of ret are safe, and write a verifier that checks that ret has one of these safe values.

So the question is: What do we put as the verifier for ret to remain safe?

Perhaps unsurprisingly, the answer here is "it depends". However, the intuition is: the safety check you should put in the verifier should ensure that ret has a value that does not cause a memory safety issue in the rest of the program.

Let's continue with our example of verifying ret, which is tainted return from add.

Let's look at how ret is used to figure this out.

  int array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
  // Call the library add function
  auto ret = add(3, 4);
  auto array_val = array[ret];

In this example, this is simple. We can see that ret is only used in one place:

  auto array_val = array[ret];

Thus, we simply need to ensure that ret isn't a value that is bigger than the size of array. We can ensure this by writing the following:

  auto ret = sandbox.invoke_sandbox_function(add, 3, 4)
                   .copy_and_verify([](unsigned val){
    release_assert(val < 10, "Unexpected result");
    return val;
  });

Here, release_assert is just a small macro in the file that calls abort() if the check fails.

What happens if we get the verifier wrong?

Unfortunately, there is nothing RLBox can do to make sure that your verifier is correct. Ultimately, verifiers is a part of your trusted code base (TCB), and you have to get this right. However, the one upside of RLBox is that all verifiers are clearly marked, making them easier to check during a security audit. It is also possible that static analysis tools can be configured to sanity check the verifiers.

What are the various untainting APIs that I can use for different types?

We discuss this in more detail in the advanced topics chapter.

What happens if we can't figure out a verifier?

We discuss this in more detail in the advanced topics chapter.

Handling tainted strings

Let's now sandbox the call to the echo function which takes a slightly more interesting argument: a string. Here, we can't simply pass a string literal as an argument: the sandbox cannot access application memory where this would be allocated. RLBox thus prevents this code from compiling.

To fix the compilation error, we must allocate a buffer in sandbox memory and copy the string we want to pass to echo into this region. We can do this with the following code:

  // Call the library echo function
  const char* helloStr = "hi hi!";
  size_t helloSize = strlen(helloStr) + 1;
  tainted_mylib<char*> taintedStr = sandbox.malloc_in_sandbox<char>(helloSize);
  strncpy(sandbox, taintedStr, helloStr, helloSize);

Here taintedStr is a tainted string: it lives in the sandbox memory and could be written to by the (compromised) library code concurrently. We have allocated this my calling the malloc_in_sandbox API.

After this, we have to copy the string to the sandbox. Normally, we could copy strings with strncpy, however, this is of type tainted<char*>. To make this simpler, RLBox provides an rlbox::strncpy which allows passing tainted strings as destinations. The only difference in the signature of rlbox::strncpy compared to strncpy is that the first parameter must be the sandbox. Internally, RLBox ensures that the string copies remain within the sandbox boundary.

Note: if you do need to convert a tainted pointer to a raw pointer, you can do so by following the approach listed in the advanced topics chapter

Now, we can just call the function and free the allocated string:

  sandbox.invoke_sandbox_function(echo, taintedStr);
  sandbox.free_in_sandbox(taintedStr);

Handling callbacks

Finally, let's sandbox the call to the call_cb function. To do this, we need to modify the callback to have a signature that RLBox permits. Currently the callback looks like this:

void hello_cb(const char* str)
{
  release_assert(str != nullptr, "Expected value for string");
  printf("hello_cb: %s\n", str);
}

To modify this to a signature RLBox will allow, we need to

Set the first parameter to be a reference to the sandbox
Make all parameters and returns a tainted value. (A void return does not need to be tainted)

With this change, the callback will now look like

void hello_cb(rlbox_sandbox_mylib& sandbox, tainted_mylib<const char*> str)

This callback is called with a tainted string. To actually use the tainted string we need to verify it. To do this, we use the string verification function copy_and_verify_string() with a simple verifier:

    str.copy_and_verify_string([](unique_ptr<char[]> val) {
        release_assert(val != nullptr && strlen(val.get()) < 1024, "val is null or greater than 1024\n");
        return move(val);
    });

This verifier moves the string if it is not null and if its length is less than 1KB. In the callback we simply print this string.

Let's now continue back in main. To call_cb with the callback we first need to register the callback - otherwise RLBox will disallow the library-application call - and pass the callback to the call_cb function:

  // register callback
  auto cb = sandbox.register_callback(hello_cb);
  // Call the library function call_cb, passing in the callback hello_cb
  sandbox.invoke_sandbox_function(call_cb, cb);

Note that cb here is an RAII type. Meaning the callback is automatically unregistered if cb goes out of scope. If you want the callback to be registered for longer, make sure to keep cb alive.

Building and running

To build this example on your machine, run the following commands

cd rlbox-book/src/examples/noop-hello-example
cmake -S . -B ./build -DCMAKE_BUILD_TYPE=Debug
cmake --build ./build --config Debug --parallel

Then run it to make sure everything is working.

./build/main

You should see the following output:

Hello from mylib
Got array value 7
echo: hi hi!
hello_cb: hi again!

Enforcing isolation with the Wasm2c sandbox

The noop backend makes it easy to add security checks. However, it does not enforce isolation. To finish sandboxing your library, we will need to:

Update the application main.cpp to use the wasm2c sandbox backend instead of noop.
Compile our library e.g. mylib.c to wasm i.e. mylib.wasm - adding isolation to your library. Compile that resulting mylib.wasm file to C (mylib.wasm.c and mylib.wasm.h) with the wasm2c compiler - allow it to be compiled and linked with our application.

We will look at each these steps next and end with instructions on how you can try this out.

Modifying the application to use the wasm2c RLBox plugin/backend

Making this change is very simple with RLBox. In fact, it can be done exclusively in the boilerplate. Here is the boilerplate to use the wasm2c backend.

// We're going to use RLBox in a single-threaded environment.
#define RLBOX_SINGLE_THREADED_INVOCATIONS
// The fixed configuration line we need to use for the wasm2c sandbox.
// It specifies that all calls into the sandbox are resolved statically.
#define RLBOX_USE_STATIC_CALLS() rlbox_wasm2c_sandbox_lookup_symbol
// The rlbox wasm2c plugin requires that you provide the wasm2c module's name
#define RLBOX_WASM2C_MODULE_NAME mylib

// Include the produced header from wasm2c
#include "mylib.wasm.h"
#include "rlbox.hpp"
#include "rlbox_wasm2c_sandbox.hpp"

using namespace rlbox;

// Define base type for mylib using the noop sandbox
RLBOX_DEFINE_BASE_TYPES_FOR(mylib, wasm2c);

You'll probably notice that there are only a handful of changes.

We now use "rlbox_wasm2c_sandbox.hpp instead of rlbox_noop_sandbox.hpp
The sandbox type which is the second parameter to the macro RLBOX_DEFINE_BASE_TYPES_FOR has now changed to wasm2c from noop
The boilerplate for RLBOX_USE_STATIC_CALLS has changed to use the wasm2c backend's boilerplate
The wasm2c backend/plugin requires an extra piece of boilerplate which is the name of the wasm module as specified in the macro RLBOX_WASM2C_MODULE_NAME
The wasm2c backend/plugin requires the produces mylib.wasm.h (we'll discuss how to produce this in the next section), to be included in the file

These are mostly mechanical changes and are straightforward. Modifying the build is perhaps slightly more challenging as building wasm libraries involves multiple steps.

Modifying the build to produce to wasm sandboxed library

To show how we will update the build, we will use two CMakeLists.txt files as a reference

The CMakeLists used by the noop sandbox
The CMakeLists used by the wasm2c sandbox

As you can see the wasm CMakeLists is quite a bit longer. Below, we will give a high-level overview of the steps, so you can follow what is happening in the Wasm build.

To build and use the Wasm sandboxed library, we need several additional repos/tools

We will need the rlbox wasm2c plugin/backend
We will need a version of clang that can produce Wasm files, specifically, Wasm files that target WebAssembly System Interface (WASI). WASI is a group of standards-track API specifications designed to provide a secure standard interface for Wasm applications. Specifically, you need WASI if you want to use printf, timers, anything that makes a syscall. We will thus rely on the wasi-sdk, which provides wasi-clang, a version of clang that can target WASI, and wasi-libc (a custom version of musl libc modified for this use case).
Finally, we will use the wasm2c Wasm compiler. Wasm files need to be compiled into native libraries that can be linked in your application. Unlike regular native libraries however, these libraries are produces by sandboxed compiler is guaranteed to be sandboxed. The wasm2c compiler in particular compiles Wasm files by first transpiling it to C (this produced C is basically machine code with a lot of sandboxing checks, and is not going to be readable), and then compiling the resulting C with a regular C compiler to produce native objects.

After we download these repos, we can then take the following steps

Build the wasm2c sandbox compiler and runtime. This is a project that can be built using CMake. You can read more about how to build wasm2c in their readme
We need to compile our mylib.c to mylib.wasm using wasi-clang. The command in the CMakeLists.txt that does this is
```
${wasiclang_SOURCE_DIR}/bin/clang
   --sysroot ${wasiclang_SOURCE_DIR}/share/wasi-sysroot/
   -O3
   -Wl,--export-all -Wl,--no-entry -Wl,--growable-table -Wl,--stack-first -Wl,-z,stack-size=1048576 -Wl,--export-table
   -o ${MYLIB_WASM}
   ${C_DUMMY_MAIN}
   ${CMAKE_SOURCE_DIR}/mylib.c
```
There are a number of flags that start with -Wl that must be specified so we produce a Wasm file with the properties we'd expect. You can read more about these flags in the Wasm lld docs page. The output file ${MYLIB_WASM} corresponds to mylib.wasm and the input C files are mylib.c and ${C_DUMMY_MAIN}. ${C_DUMMY_MAIN} as the name indicates is an empty main function seen here. You could avoid this dummy main by using Wasm's reactor flag.
Next we need to run wasm2c to transpile our wasm file back to a C file with checks. This is fairly straightforward, and you can read more about it in the wasm2c documentation
We now have to compile the transpiled wasm file. The process for doing this is described in detail in the wasm2c repo. Broadly, we need to compile the transpiled files with the wasm2c runtime and appropriate includes. This will now generate our native sandboxed library mylib
We can now build our application using mylib

Building and running the wasm2c backend

To build this example on your machine, run the following commands

cd rlbox-book/src/examples/wasm-hello-example
cmake -S . -B ./build -DCMAKE_BUILD_TYPE=Debug
cmake --build ./build --config Debug --parallel

Then run it to make sure everything is working.

./build/main

You should see the following output:

Hello from mylib
Got array value 7
echo: hi hi!
hello_cb: hi again!

Advanced RLBox topics

In this chapter, we will go beyond the basic tutorial to understand some more advanced patterns you will encounter when using RLBox.

Tainted computations

RLBox does not permit some operations on tainted values for safety reasons

Using a tainted<unsigned int> to index an array in the host, i.e., int[]. This can lead to an out of bounds array access if allowed.
Branching on a tainted value, i.e., if conditions with tainted values in the condition, for loops with tainted values in the condition. The only exception here is comparing a tainted pointer to nullptr. So assuming ret is a tainted<int> , code of the form if (ret == 4) {...} would not be allowed. This is because any scenario where an application's control flow is determined by a tainted value is inevitably going to result in security issues. Thus, RLBox returns tainted<bool> during comparisons like ret == 4 which do not allow branching or loops. You can, of course, try to verify the tainted<bool> using copy_and_verify, or better, verify ret prior to comparison to avoid this issue.
Any form of comparison on the result of dereferencing a tainted pointer. Concretely, code of the form
```
tainted<int*> a = ...;
tainted<bool> b = *(*a == 4)*;
```
would not compile. This is because dereferencing a tainted pointer refers to a location in the sandbox's memory, and the sandbox could change that at any moment. To represent the volatility of this comparison, RLBox returns a new type tainted_bool_hint with the result of this comparison.

Untainting different types

As you sandbox more library APIs, you will soon have to start verifying (removing the tainting) objects of different types. Below, we'll go through the APIs used to remove tainting in different scenarios.

In all examples, there are a few things to keep in mind:

We will use a comment <insert security checks here> to refer that you need to apply domain specific checking to ensure that you only permit values that do not cause memory safety problems in the rest of your program. The tutorial goes through an example of adding security checks in more detail.
Note that all verifier can return values of any type or no value at all. This gives some flexibility to how you want to manage your data. So for example, you may be untainting an int, but can return an unsigned int or an int*, or void etc.
We will refer to the term fundamental types frequently. C/C++ defines the fundamental types as the types built-in to the language, such as int, float etc.

Untainting fundamental types

These types can be untainted with a verifier of the following form

  tainted<int> a = ... ; //
  int a_verified = a.copy_and_verify([](int val){
    // <insert security checks here>
    return val;
  });

There maybe some scenarios where the application can handle any possible integer from the sandbox, i.e., the <insert security checks here> can be left empty.

An example of this could be when you are calling an sandboxed library function that returns an integer 0 on success and a non-zero error code otherwise. From the host application perspective, you may do something if the returned value is 0, and exit otherwise.

In this scenario, RLBox provides a shorthand API called unverified_safe_because which can be used as follows.

  tainted<int> a = ... ; //
  int a_verified = a.unverified_safe_because("Error code. App is robust to all values of error code");

The unverified_safe_because API takes a string argument that allows the developer to document why they are doing this and why its safe. This string does not have any special meaning in the code. Rather the RLBox API asks you to provide a free-form string that acts as documentation. Essentially you are providing a string that says it is safe to remove the tainting from this type because... . Such documentation may be useful to other developers who read your code. The above is equivalent to:

  tainted<int> a = ... ; //
  int a_verified = a.copy_and_verify([](int val){
    // Error code. App is robust to all values of error code
    return val;
  });

Untainting byte buffers

Sometimes the sandbox returns a buffer that you want to untaint. For example, you may use a sandboxed XML parse to parse jpeg images. In this example, after the sandboxed library parses th image, it will produce a byte buffer with image pixels, probably of the type tainted<char*> or some other tainted pointer to a fundamental type.

These types can be untainted with a verifier of the following form

  tainted<int*> a = ... ; //
  std::unique_ptr<int*> a_verified = a.copy_and_verify_range([&](std::unique_ptr<int*> val){
    // <insert security checks here>
    return val;
  }, size);

This API copies size bytes out of the sandbox and applies the necessary checks like ensuring the entire buffer is coming from the sandbox memory.

In the example of a sandboxed image decoder, the buffer may hold completely random data instead of pixels of the image. It is up to you to figure out what your application context is and what security checks need to be in place. In the example of a sandboxed image decoder, the usual expectation is that there is no sensible way to check that decoding has occurred correctly, and rather the rest of your program should be robust to showing an incorrect image on the screen. In this case, there would be no security check in place.

Untainting byte buffers without copying

There maybe some scenarios where the application can handle any possible byte buffer from the sandbox, i.e., the <insert security checks here> can be left empty.

An example of this, would be if the image being displayed to the app in our sandboxed libjpeg example is say an application background and may not really matter.

In this case, we may want to use the byte before without making a copy, to avoid overheads.

RLBox provides aan API for this called unverified_safe_pointer_because which can be used as follows.

tainted<char*> a = ...;
char* raw = a.unverified_safe_pointer_because(10, "Demo of a raw pointer");

unverified_safe_pointer_because takes two parameters. The first is the number of bytes in this pointer that you will be accessing. RLBox needs this to ensure that these many bytes of the pointer stay within the sandbox boundary. The second is a string, that allows the developer to document why they are doing this and why its safe. This string does not have any special meaning in the code. Rather the RLBox API asks you to provide a free-form string that acts as documentation. Essentially you are providing a string that says it is safe to remove the tainting from this type because... . Such documentation may be useful to other developers who read your code.

Untainting C-strings

Untainting C-strings of type tainted<char*> is covered in the tutorial.

These types can be untainted with a verifier of the following form

  tainted<char*> str = ...;
  std::unique_ptr<char[]> checked_string =
    str.copy_and_verify_string([](std::unique_ptr<char[]> val) {
        // <insert security checks here>
        return move(val);
    });

The API ensures that the tainted string lives within the sandbox and is null terminated, and makes a copy of the string that you can use.

A useful check in <insert security checks here> is also to limit the size of the string you want to allow.

Untainting one-level pointers to fundamental types

If you have a tainted pointer to a fundamental type such as tainted<int*>, tainted<float*> etc., these types can be untainted with a verifier of the following form

  tainted<int*> a = ... ; //
  std::unique_ptr<int> a_verified = a.copy_and_verify([](std::unique_ptr<int> val){
    // <insert security checks here>
    return val;
  });

The idea here is that RLBox is effectively creating a deep clone of the object after doing the required checks of ensuring the pointer is in the sandbox. We would ideally allow this API for more types, but C++ makes it hard to know when we can reasonably perform a deep clone of an object, and hence this API is limited to tainted pointers to fundamental types.

This API is also limited to one-level pointers, i.e., things like tainted<int*> and is not allowed for tainted<int**>.

Untainting just the "address bits" of a pointer

Your application may sometimes need just the raw bits of a tainted pointer without needing to look at the data being pointed to. An example of this would be if you want to maintain a hashmap of pointers in the class, but the pointers are produced by the sandbox.

tainted<int*> foo = sandbox.invoke_sandbox_function(...);

std::map<int*, int> my_map;
my_map[foo] = 3; // RLBox gives a compiler error

For this scenario, RLBox provides an API called copy_and_verify_address which takes a verifier that accepts a uintptr_t. This API can be used as follows.

tainted<int*> foo = sandbox.invoke_sandbox_function(...);

std::map<int*, int> my_map;
uintptr_t foo_verified = foo.copy_and_verify_address([&](uintptr_t addr) {
    // <insert security checks here>
    return addr;
});
my_map[foo_verified] = 3;

Untainting C-arrays of fundamental types

Untainting C-arrays of fundamental types like tainted<int[3]> is possible using copy_and_verify. Note, however, that the API expects the verifier to accept an argument of std::array, so that the data is correctly copied.

These types can be untainted with a verifier of the following form

  tainted<int[3]> a = ... ; //
  std::array<int, 3> a_verified = a.copy_and_verify([](std::array<int, 3> val){
    // <insert security checks here>
    return val;
  });

What happens if we can't figure out a verifier?

In larger codebases, it may not be easy to find a suitable verifier for ret. This maybe because ret is used in a lot of places and it is difficult to figure out all the locations where it is used. Broadly, there are a few different strategies to deal with this that we describe below. Ultimately, you probably want to use a mix of these strategies to make this a tractable problem

Strategy 1: Defer verification

The first strategy to simplify identification of verifiers is to defer it. In fact, as a general rule you should do this wherever possible. This is because the more you defer verification, the further into your program you move the verifier. And the further your program, it is usually much clearer what a tainted value is used for, and thus easier to write the verifier.

To make this easier, RLBox allows a number of operations on tainted values directly (specifically, in scenarios where RLBox can ensure their safety).

For example, if you add a line in the application

auto ret_plus_1 = ret + 1;

and attempt to compile your code, you will see that the compiler does not report any error on this line. This is because RLBox permits this operation and ret_plus_1 is now a tainted_mylib<unsigned int>, i.e., RLBox has propagated the tainting, a "tainted computation".

Indeed there are a number of operations that are supported as "tainted computations", and produce a new tainted value. As a few examples:

Arithmetic on a tainted value.
Dereferencing a tainted pointer (a tainted pointer always points to memory in the sandbox. RLBox automatically checks this prior to the dereference to ensure safety).
Comparing a tainted pointer to nullptr
Using a tainted<unsigned int> to index a tainted<int[]>

There are also operations that are not allowed for safety reasons

Using a tainted<unsigned int> to index an array in the host, i.e., int[]. This can lead to an out of bounds array access if allowed.
Branching on a tainted value, i.e., if conditions with tainted values in the condition, for loops with tainted values in the condition.

You can learn more about this in the advanced topics chapter

Trying to figure out what operations are allowed or not may seem tricky, but there is a straightforward approach. Try the operation! If RLBox doesn't throw a compilation error, you can be assured it is safe and you can defer this.

Strategy 2: Verification for local use

Another option is to simply verify a tainted variable for one use case at a time. Rather than verifying the tainted value for the rest of the program, verify it for the next use case only, and do not remove the tainting.

Strategy 3: Enforce the library contract

Finally, in scenarios where a library's security contract is clearly defined for an output, you could use this a verifier for the tainted data as soon as it is returned by a sandboxed function.

Getting raw pointers into the sandbox memory

Typically, RLBox does not let you create raw pointers into sandbox memory, i.e., pointers of the form char*. Rather the pointers will be wrapped as tainted<char*>. However, there maybe certain scenarios where you really need a raw pointer into sandbox memory.

You can do this with the unverified_safe_pointer_because API. This converts a tainted pointer to a raw pointer with only minimal verification of checking that the pointer is within the sandbox boundary.

The details of how to use this API are provided here.

Miscellaneous troubleshooting

Assigning 0 or NULL to tainted pointers is not supported

Unfortunately, NULL in C++ is types as int and this makes it indistinguishable from any other integer. So, RLBox does not allow zeroing out pointers with 0 or NULL. You can, however, pass NULL using the C++ nullptr keyword.

I cannot call `copy_and_verify` on `tainted<void*>`

RLBox does not allow copy_and_verify on tainted<void*> as it could lead to some anti-patterns in verifiers. Cast it to a different tainted pointer with sandbox_reinterpret_cast and then call copy_and_verify. Alternately, you can use the UNSAFE_unverified API to do this without casting.

Alternate isolation backends

WebAssembly with wasm2c is just one way to isolate running code. We have focussed on this so far as this is the approach that is being used in production in Firefox today. However, RLBox's plugin model is completely general, and can be configured to support other isolation backends as well.

Note that the below plugins are experimental and are not actively maintained (which may mean they have compilation bugs that you'd have to fix). This is meant purely as a reference for you to write your own plugins.

Experimental/previously-used RLBox plugins for isolation

Using LFI (See this chapter for more details)
https://github.com/UT-Security/rlbox_lfi_sandbox
Using WebAssembly through Lucet
https://github.com/PLSysSec/rlbox_lucet_sandbox
Using WebAssembly through Wamr
https://github.com/PLSysSec/rlbox_wamr_sandbox
Using WebAssembly through Wasmtime
https://github.com/PLSysSec/rlbox_wasmtime_sandbox
Using Google's Native Client
https://github.com/PLSysSec/rlbox_nacl_sandbox

Enforcing isolation with the LFI sandbox

Note: this section is written as a continuation of the tutorial as an alternate to using RLBox with wasm2c. It's best if you follow the tutorial upto that point and continue reading below.

The noop backend makes it easy to add security checks. However, it does not enforce isolation. To finish sandboxing your library, we will need to:

Update the application main.cpp to use the lfi sandbox backend instead of noop.
Compile our library e.g. mylib.c using the LFI compiler to native sandboxed object, and link this into our application.

We will look at each these steps next and end with instructions on how you can try this out.

Modifying the application to use the lfi RLBox plugin/backend

Making this change is very simple with RLBox. In fact, it can be done exclusively in the boilerplate. Here is the boilerplate to use the lfi backend.

// We're going to use RLBox in a single-threaded environment.
#define RLBOX_SINGLE_THREADED_INVOCATIONS

#include "rlbox.hpp"
#include "rlbox_lfi_sandbox.hpp"

using namespace rlbox;

extern "C" {
extern uint8_t mylib_start[];
extern uint8_t mylib_end[];
};

// Define base type for mylib using the noop sandbox
RLBOX_DEFINE_BASE_TYPES_FOR(mylib, lfi);

You'll probably notice that there are only a handful of changes.

We now use "rlbox_lfi_sandbox.hpp instead of rlbox_noop_sandbox.hpp
The sandbox type which is the second parameter to the macro RLBOX_DEFINE_BASE_TYPES_FOR has now changed to lfi from noop
The boilerplate for RLBOX_USE_STATIC_CALLS has been removed as lfi backend doesn't need this, unlike say the noop backend
The lfi backend/plugin requires an extra piece of boilerplate which is to define the beginning and end of the lfi library that has been embedded into the application as a datablob. This is shown as mylib_start and mylib_end here. We will discuss how to generate these variables in the next section.

These are mostly mechanical changes and are straightforward. Modifying the build is perhaps slightly more challenging as building lfi libraries involves multiple steps.

Modifying the build to produce to lfi sandboxed library

To show how we will update the build, we will use two CMakeLists.txt files as a reference

The CMakeLists used by the noop sandbox
The CMakeLists used by the lfi sandbox

As you can see the lfi CMakeLists is quite a bit longer. Below, we will give a high-level overview of the steps, so you can follow what is happening in the LFI build.

To build and use the LFI sandboxed library, we need several additional repos/tools

We will need the rlbox lfi plugin/backend
We will need a version of clang that can produce LFI files. We will used the prebuilt binaries of LFI clang available in the official repo which brings the compiler, the custom libc and so on.
Finally, we need the lfi-runtime, which supports the loading and execution of LFI runtimes.

After we download these repos, we can then take the following steps

Build the lfi runtime. This is a project that can be built using meson. You can read more about how to build lfi in their readme
We need to compile our mylib.c to mylib.wasm using lfi's clang. The command in the CMakeLists.txt that does this is
```
 PATH=$ENV{PATH}:${lficlang_SOURCE_DIR}/lfi-bin
     ${lficlang_SOURCE_DIR}/bin/clang
     ${LFI_SBX_BUILD_TYPE_FLAGS}
     -Wl,--export-dynamic
     -static-pie
     -o ${MYLIB_ELF}
     ${CMAKE_SOURCE_DIR}/mylib.c
     -L ${LFI_SYSROOT_PATH} -lboxrt
```
This commands starts by adding the lfi clang folder to the $PATH. ${LFI_SBX_BUILD_TYPE_FLAGS} is just going to be -O0 or -O3. LFI requires that the -Wl,--export-dynamic and -static-pie flags are present in the compilation so that the produced code is position independent executable that has as a symbol table. Finally the produced elf is linked with LFI's in-sandbox runtime libboxrt.
Next we will create a simple assembly file that just embeds the produced lfi binary in a datablob as part of the application. This can be done easily with a file like this which includes the binary as a datablob between two symbols ${INCSTUB_FILENAME}_start and ${INCSTUB_FILENAME}_end. We define ${INCSTUB_FILENAME} to be mylib in our example, so we get the required mylib_start and mylib_end symbols.
Finally, we can now build our application, and including the file with the datablob, to embed the lfi-sandboxed library as part of the application. We also have link in the lfi-runtime's liblfi which is needed to instantiate and destroy sandboxes.

Building and running the lfi backend

To build this example on your machine, run the following commands

cd rlbox-book/src/examples/lfi-hello-example
cmake -S . -B ./build -DCMAKE_BUILD_TYPE=Debug
cmake --build ./build --config Debug --parallel

Then run it to make sure everything is working.

./build/main

You should see the following output:

Hello from mylib
Got array value 7
echo: hi hi!
hello_cb: hi again!

Additional content

This section contains additional resources such as useful recipes, tutorials which may have additional content (but may be slightly out of date) etc.

Modifying a Makefile based project to compile to Wasm

This tutorial assumes you have the rlbox_wasm2c_sandbox git repo is in the path $(RLBOX_WASM2C_PATH), and you have installed wasi-sdk on your computer in the path $(WASI_SDK_PATH)

Build the sources of your library along with the file $(RLBOX_WASM2C_PATH)/c_src/wasm2c_sandbox_wrapper.c. Pass the flags -Wl,--export-all -Wl,--stack-first -Wl,-z,stack-size=262144 -Wl,--no-entry -Wl,--growable-table -Wl,--import-memory -Wl,--import-table to the linker using the wasi-clang compiler. This will produce a wasm module.

To edit an existing Make based build system, you can run the commmand.

$(WASI_SDK_PATH)/bin/clang --sysroot $(WASI_SDK_PATH)/share/wasi-sysroot $(RLBOX_WASM2C_PATH)/c_src/wasm2c_sandbox_wrapper.c -c -o $(RLBOX_WASM2C_PATH)/c_src/wasm2c_sandbox_wrapper.o

AR=$(WASI_SDK_PATH)/bin/ar                                  \
CC=$(WASI_SDK_PATH)/bin/clang                               \
CXX=$(WASI_SDK_PATH)/bin/clang++                            \
CFLAGS="--sysroot $(WASI_SDK_PATH)/share/wasi-sysroot"      \
LD=$(WASI_SDK_PATH)/bin/wasm-ld                             \
LDLIBS=$(RLBOX_WASM2C_PATH)/c_src/wasm2c_sandbox_wrapper.o  \
LDFLAGS="-Wl,--export-all -Wl,--stack-first -Wl,-z,stack-size=262144 -Wl,--no-entry -Wl,--growable-table -Wl,--import-memory -Wl,--import-table"   \
make

Using a Wasm module with imported memory and imported tables

By default, RLBox operates on Wasm modules with an imported memory and imported table. This allows RLBox to optimize the memory allocation's alignment for its internal operations. However, you can also use modules with exported memory and exported tables (albeit with some performance penalty).

To do this, adjust the flags during the compilation of your Wasm module to export memory and tables. Specifically, remove the arguments -Wl,--import-memory -Wl,--import-table if present, and use the arguments -Wl,--export-all -Wl,--export-table

Additional material

The best example of how to use RLBox is to see its use in Firefox. The Firefox code search is a great way to do this.
Working through the simple library example repo is a good way to get a feel for retrofitting a simple application that uses a potentially buggy library is a good next. The solution is available in the solution folder in the same repo.
Documentation of the core RLBox APIs.
Short tutorial on using the RLBox APIs. Note that this tutorial uses the old Lucet Wasm compiler.
The RLBox test suite itself has a number of examples.
Finally, the original academic paper explaining RLBox and its use in Firefox RLBoxPaper at the USENIX Security 2020 and the accompanying video explanations are a good way to get an overview of RLBox.

Practical third-party library sandboxing with RLBox