Overview
RLBox is a toolkit for sandboxing third party C libraries, that are being used by C++ code (support for other languages is in the works). RLBox was originally developed for Firefox1, which has been shipping with it since 2020.
The RLBox toolkit consists of:
-
A C++ framework (RLBox) that makes it easy to retrofit existing application code to safely interface with sandboxed libraries.
-
An RLBox plugin that allows the use of wasm2c compiler for isolating (sandboxing) C libraries with Wasm.
In this section, we provide an overview of the RLBox framework, its reason for being, and a high level sketch of how it works. In the next section, we will provide a tutorial that provides an end-to-end example of applying RLBox to a simple application.
Why RLBox
Work on RLBox began several years ago while attempting to add fine grain isolation to third party libraries in the Firefox renderer. Initially we attempted this process without any support from a framework like RLBox, instead attempting to manually deal with all the details of sandboxing such as sanitizing untrusted inputs, and reconciling ABI differences between the sandbox and host application.
This went poorly; it was tedious, error prone, and did nothing to abstract the details of the underlying sandbox from the developer. We had basically no hope that this would result in code that was maintainable, or that normal Mozilla developers who were unfamiliar with the gory details of our system would be able to sandbox a new library, let alone maintain existing ones.
So we scrapped this manual approach and built RLBox1.
RLBox automates many of the low level details of sandboxing and allows you, as a security engineer or application developer, to instead focus just on what you need to do to sandbox your particular application.
To sandbox a library — and thus to move to a world where the library is no longer trusted — we need to modify this application-library boundary. For example, we need to add security checks in Firefox to ensure that any value from the sandboxed library is properly validated before it is used. Otherwise, the library (when compromised) may be able to abuse Firefox code to hijack its control flow 1. The RLBox API is explicitly designed to make retrofitting of existing application code simpler and less error-prone.2
What does RLBox provide?
RLBox ensures that a sandboxed library is memory isolated from the rest of the application — the library cannot directly access memory outside its designated region — and that all boundary crossings are explicit. This ensures that the library cannot, for example, corrupt Firefox's address space. It also ensures that Firefox cannot inadvertently expose sensitive data to the library. The figure below illustrates this idea.
Memory isolation is enforced by the underlying sandboxing mechanism (e.g.,
using Wasm3) from the start, when you create the sandbox with
create_sandbox()
. Explicit boundary
crossings are enforced by RLBox (either at compile- or and run-time). For
example, with RLBox you can't call library functions directly; instead, you
must use the invoke_sandbox_function()
method. Similarly, the library cannot
call arbitrary Firefox functions; instead, it can only call functions that you
expose with the register_callback()
method. (To simplify the sandboxing task, though, RLBox does expose a standard
library as described in the Standard Library.)
When calling a library function, RLBox copies simple values into the sandbox
memory before calling the function. For larger data types, such as structs and
arrays, you can't simply pass a pointer to the object. This would leak
ASLR and,
more importantly, would not work: sandboxed code cannot access application
memory. So, you must explicitly allocate memory in the sandbox via
malloc_in_sandbox()
and copy application
data to this region of memory (e.g., via strlcpy
).
RLBox similarly copies simple return values and callback arguments. Larger data structures, however, must (again) be passed by sandbox-reference, i.e., via a reference/pointer to sandbox memory.
To ensure that application code doesn't unsafely use values that originate in the sandbox - and may thus be under the control of an attacker - RLBox considers all such values as untrusted and taints them. Tainted values are essentially opaque values (though RLBox does provide some basic operators on tainted values). To use a tainted value, you must unwrap it by (typically) copying the value into application memory - and thus out of the reach of the attacker - and verifying it. Indeed, RLBox forces application code to perform the copy and verification in sync using verification functions (see this).
References
Retrofitting Fine Grain Isolation in the Firefox Renderer by S. Narayan, et al.
The Road to Less Trusted Code: Lowering the Barrier to In-Process Sandboxing by T. Garfinkel et al.
The RLBox Tutorial
In this tutorial we will walk you through the steps of adding sandboxing to a very simple application and library. However, all the basic step generalize to more complex examples.
We will start by describing the simple application that uses a library, and then describe how to sandbox this in two parts.
-
In the first part, we will look at how to use RLBox to retrofit sandboxing in an existing application, taking all the steps to ensure that control flow and data flow across the application library boundary are secure.
-
In the second part, we will look at how to re-build our library with wasm and link this into our application, so our library is isolated.
Once we complete these two steps, our library is now securely isolated from our application.
Downloading and running the examples
To get the source code for the examples in the tutorial, download the repo as shown below:
git clone https://github.com/PLSysSec/rlbox-book
The chapters going forward will give commands on how to build and run these examples.
The examples in this tutorial will be self-contained and will pull those repos
as needed. However, for reference, RLBox currently spans two repositories. One
that contains just the RLBox C++ framework and the other, which contains the
RLBox plugin for Wasm files compiled with the wasm2c
compiler (which converts
your C library to an isolated and sandboxed version). The two repos are
available here:
- The core RLBox library is available at https://github.com/PLSysSec/rlbox
- The Wasm2c RLBox plugin is available at https://github.com/PLSysSec/rlbox_wasm2c_sandbox
The example in this tutorial
For our tutorial, we're going to be sandboxing a small application that uses a
library called mylib
.
Our example library
mylib
declares four functions in mylib.h:
#pragma once
#ifdef __cplusplus
extern "C" {
#endif
void hello();
unsigned int add(unsigned int, unsigned int);
void echo(const char* str);
void call_cb(void (*cb) (const char* str));
#ifdef __cplusplus
}
#endif
And implements those function in mylib.c:
#include <stdio.h>
#include "mylib.h"
void hello() {
printf("Hello from mylib\n");
}
unsigned int add(unsigned int a, unsigned int b) {
return a + b;
}
void echo(const char* str) {
printf("echo: %s\n", str);
}
void call_cb(void (*cb) (const char* str)) {
cb("hi again!");
}
While this library is very simple, it will allow us to exercise key features of RLBox including: calling functions, copying strings into the sandbox, registering and handling callbacks from the library in the next chapters.
Our example application
The main application in main.cpp simply invokes each of these functions in turn.
#include <stdio.h>
#include <stdlib.h>
#define release_assert(cond, msg) if (!(cond)) { fputs(msg "\n", stderr); abort(); }
#include "mylib.h"
using namespace std;
// Declare callback function that's going to be invoked from the library.
void hello_cb(const char* str);
int main(int argc, char const *argv[]) {
// Call the library hello function
hello();
int array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
// Call the library add function
auto ret = add(3, 4);
auto array_val = array[ret];
printf("Got array value %d\n", ret);
// Call the library echo function
const char* helloStr = "hi hi!";
echo(helloStr);
// Call the library function call_cb, passing in the callback hello_cb
call_cb(hello_cb);
return 0;
}
void hello_cb(const char* str)
{
release_assert(str != nullptr, "Expected value for string");
printf("hello_cb: %s\n", str);
}
To build this example on your machine, run the following commands
cd rlbox-book/src/examples/hello-example
cmake -S . -B ./build -DCMAKE_BUILD_TYPE=Debug
cmake --build ./build --config Debug --parallel
Then run it to make sure everything is working.
./build/main
You should see the following output:
Hello from mylib
Got array value 7
echo: hi hi!
hello_cb: hi again!
Retrofitting isolation in our example
This first part of the tutorial is going to focus on modify our application to add sandboxing, the next part will focus on recompiling our library with wasm to enforce isolation.
In this example, we're going to use the noop
sandbox backend. The noop
sandbox does not actually enforce isolation, it is simply a tool that makes it
easier to port new libraries to RLBox. The noop
sandbox does nothing more than
turn our calls into the RLBox sandbox into normal function calls to the library
we already have linked in our application.
The reason for this noop
backend is that it supports incrementally porting our
application. Instead of having to worry about trying to change all our library
interfaces at once (to account for ABI differences between a sandbox and our
normal library), and deal with the resulting head-aches. We can change gradually
change our function calls from normal library calls, to sandbox calls, and at
each step test that our application continues to work as expected.
Creating a noop sandbox
To get started, in our main application (main.cpp) let's first import the RLBox library and add some necessary boilerplate in the top of the file:
// We're going to use RLBox in a single-threaded environment.
#define RLBOX_SINGLE_THREADED_INVOCATIONS
// The fixed configuration line we need to use for the noop sandbox.
// It specifies that all calls into the sandbox are resolved statically.
#define RLBOX_USE_STATIC_CALLS() rlbox_noop_sandbox_lookup_symbol
#include "rlbox.hpp"
#include "rlbox_noop_sandbox.hpp"
using namespace rlbox;
// Define base type for mylib using the noop sandbox
RLBOX_DEFINE_BASE_TYPES_FOR(mylib, noop);
Why the boilerplate?
RLBox has support for different kinds of sandboxing back-ends/plugins, thus we need to specify which backend we will use and their configurations. While there is a lot of effort to avoid boilerplate, there is certain amount that either cannot be removed, or would be too costly to remove, which are the bits you are seeing here.
What does this boilerplate do?
Let's briefly go through the boilerplate to understand their specific purpose.
#define RLBOX_SINGLE_THREADED_INVOCATIONS
tells RLBox that only one thread in
our host application will invoke functions in the sandboxed library at a given
time (There can be multiple application threads that all invoke functions, but
the host application must ensure that only one thread executes functions in the
sandboxed library. This can be done with a per-sandbox lock.). This macro allows
RLBox to elide several internal mutex calls that greatly speeds up its
performance. If you want to support multiple threads calling into the same
sandbox, you can avoid this macro.
#define RLBOX_USE_STATIC_CALLS() rlbox_noop_sandbox_lookup_symbol
tells RLBox
that the noop
sandbox makes static function calls into the library. For
technical reasons, not all sandboxes support direction function calls. So, some
sandbox backends may rely on indirect function calls via pointers into the
sandboxed library. Unfortunately, this is something RLBox must know up front, so
we have specify this. For all practical purposes, just think of this line as
something you should specify as is. Each sandbox backend/plugin will have a
version of this line in its documentation that you can copy as is.
We have #include
d two headers, rlbox.hpp
which is the base rlbox library,
and rlbox_noop_sandbox
for the noop
sandbox backend.
RLBOX_DEFINE_BASE_TYPES_FOR
defines tainted
types that are specific for each
library. In our example, this macro now gives us tainted_mylib<T>
, which will
automatically map to rlbox::tainted<T, rlbox_noop_sandbox>
. If we change the
sandbox plugin/backend in the future, the mapping will change automatically.
Creating and destroying sandboxes
Now that the boilerplate is out of the way, let's create a new sandbox instance in the top of the main function that we will use in this application.
// Declare and create a new sandbox
rlbox_sandbox_mylib sandbox;
sandbox.create_sandbox();
and destroy the sandbox at the end of main
// destroy sandbox
sandbox.destroy_sandbox();
Note: We can create multiple sandbox instances if we wanted. You can think of each sandbox instance as an isolated instance of the library. Each instance cannot interfere with another instance.
To see where this could be useful, consider securing a webserver that parses XML data from each incoming connection. You could sandbox the XML parsing library and spin up a single sandbox. This would ensure the server doesn't get compromised due to an XML parsing bug, however it won't prevent one malicious connection from interfering with the parsed XML contents of a different connection. However, you could spin up a new sandboxed XML-parser library instance for each incoming connection. This architecture would guarantee that a bug while processing an XML parameter in one of the connections will not spill over to processing of other connections.
Sandboxing function calls
We now move on to sandboxing the function calls made by the application to
mylib
. We can see that the application calls the hello
function in the
library.
hello();
To sandbox this call, this is as simple as changing the syntax to:
sandbox.invoke_sandbox_function(hello);
We have changed our code to not call hello()
directly. Instead, we use RLBox's
invoke_sandbox_function()
method. This allows RLBox to mediate the function
calls into the sandbox.
Calling sandboxed functions and verifying their return value
Let's now sandbox the call to the add
function. We can see the our application
calls the add
function as shown below.
int array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
// Call the library add function
auto ret = add(3, 4);
auto array_val = array[ret];
To change, the function call, we will use invoke_sandbox_function
as before.
auto ret = sandbox.invoke_sandbox_function(add, 3, 4)
There are now a couple of interesting things happening.
- First,
add
has arguments. These arguments are primitive types, so RLBox doesn't impose any restrictions and you can pass them to the sandboxed function as is (complex arguments have some restrictions as we will see later). - Second, if you check the type of
ret
, you'll see that RLBox ensures that the return value fromadd
is tainted and thus cannot be used without verification. Concretely the type ofret
is nowtainted_mylib<unsigned int>
. Thus if you try to compile this program, you will get a compilation error stating that a value of type ``tainted_mylib` cannot be used as an array index.
TO convert ret
back to an unsigned int
, we will have to verify it by calling
the copy_and_verify()
API. This API copies the value into application memory
and runs a verifier function we will have to specify. The verifier should ensure
ret
does not contain a value that is unexpected. For now, let's add just the
call to copy_and_verify()
so we can unwrap ret
from a
tainted_mylib<unsigned int>
to an unsigned int
without worrying about the
verifier.
auto ret = sandbox.invoke_sandbox_function(add, 3, 4)
.copy_and_verify([](unsigned val){
// .. to be specified ..
return val;
});
In the next chapter, we will discuss what we can put in the verifier to ensure
the safety of ret
.
Untainting values
Continuing our example, we need to figure out what values of ret
are safe,
and write a verifier that checks that ret
has one of these safe values.
So the question is: What do we put as the verifier for ret
to remain safe?
Perhaps unsurprisingly, the answer here is "it depends". However, the intuition
is: the safety check you should put in the verifier should ensure that ret
has
a value that does not cause a memory safety issue in the rest of the program.
Let's continue with our example of verifying ret
, which is tainted return from
add
.
Let's look at how ret
is used to figure this out.
int array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
// Call the library add function
auto ret = add(3, 4);
auto array_val = array[ret];
In this example, this is simple. We can see that ret
is only used in one
place:
auto array_val = array[ret];
Thus, we simply need to ensure that ret
isn't a value that is bigger than the
size of array
. We can ensure this by writing the following:
auto ret = sandbox.invoke_sandbox_function(add, 3, 4)
.copy_and_verify([](unsigned val){
release_assert(val < 10, "Unexpected result");
return val;
});
Here, release_assert
is just a small macro in the file that calls abort()
if
the check fails.
What happens if we get the verifier wrong?
Unfortunately, there is nothing RLBox can do to make sure that your verifier is correct. Ultimately, verifiers is a part of your trusted code base (TCB), and you have to get this right. However, the one upside of RLBox is that all verifiers are clearly marked, making them easier to check during a security audit. It is also possible that static analysis tools can be configured to sanity check the verifiers.
What are the various untainting APIs that I can use for different types?
We discuss this in more detail in the advanced topics chapter.
What happens if we can't figure out a verifier?
We discuss this in more detail in the advanced topics chapter.
Handling tainted strings
Let's now sandbox the call to the echo
function which takes a slightly more
interesting argument: a string. Here, we can't simply pass a string literal as
an argument: the sandbox cannot access application memory where this would be
allocated. RLBox thus prevents this code from compiling.
To fix the compilation error, we must allocate a buffer in sandbox memory and
copy the string we want to pass to echo
into this region. We can do this with
the following code:
// Call the library echo function
const char* helloStr = "hi hi!";
size_t helloSize = strlen(helloStr) + 1;
tainted_mylib<char*> taintedStr = sandbox.malloc_in_sandbox<char>(helloSize);
strncpy(sandbox, taintedStr, helloStr, helloSize);
Here taintedStr
is a tainted string: it lives in the sandbox memory and could
be written to by the (compromised) library code concurrently. We have allocated
this my calling the malloc_in_sandbox
API.
After this, we have to copy the string to the sandbox. Normally, we could copy
strings with strncpy
, however, this is of type tainted<char*>
. To make this
simpler, RLBox provides an rlbox::strncpy
which allows passing tainted strings
as destinations. The only difference in the signature of rlbox::strncpy
compared to strncpy
is that the first parameter must be the sandbox.
Internally, RLBox ensures that the string copies remain within the sandbox
boundary.
Note: if you do need to convert a tainted pointer to a raw pointer, you can do so by following the approach listed in the advanced topics chapter
Now, we can just call the function and free the allocated string:
sandbox.invoke_sandbox_function(echo, taintedStr);
sandbox.free_in_sandbox(taintedStr);
Handling callbacks
Finally, let's sandbox the call to the call_cb
function. To do this, we need
to modify the callback to have a signature that RLBox permits. Currently the
callback looks like this:
void hello_cb(const char* str)
{
release_assert(str != nullptr, "Expected value for string");
printf("hello_cb: %s\n", str);
}
To modify this to a signature RLBox will allow, we need to
- Set the first parameter to be a reference to the sandbox
- Make all parameters and returns a tainted value. (A
void
return does not need to be tainted)
With this change, the callback will now look like
void hello_cb(rlbox_sandbox_mylib& sandbox, tainted_mylib<const char*> str)
This callback is called with a tainted string. To actually use the tainted
string we need to verify it. To do this, we use the string verification function
copy_and_verify_string()
with a simple verifier:
str.copy_and_verify_string([](unique_ptr<char[]> val) {
release_assert(val != nullptr && strlen(val.get()) < 1024, "val is null or greater than 1024\n");
return move(val);
});
This verifier moves the string if it is not null and if its length is less than 1KB. In the callback we simply print this string.
Let's now continue back in main
. To call_cb
with the callback we first
need to register the callback - otherwise RLBox will disallow the
library-application call - and pass the callback to the call_cb
function:
// register callback
auto cb = sandbox.register_callback(hello_cb);
// Call the library function call_cb, passing in the callback hello_cb
sandbox.invoke_sandbox_function(call_cb, cb);
Note that cb
here is an RAII type. Meaning the callback is automatically
unregistered if cb
goes out of scope. If you want the callback to be
registered for longer, make sure to keep cb
alive.
Building and running
To build this example on your machine, run the following commands
cd rlbox-book/src/examples/noop-hello-example
cmake -S . -B ./build -DCMAKE_BUILD_TYPE=Debug
cmake --build ./build --config Debug --parallel
Then run it to make sure everything is working.
./build/main
You should see the following output:
Hello from mylib
Got array value 7
echo: hi hi!
hello_cb: hi again!
Enforcing isolation with the Wasm2c sandbox
The noop
backend makes it easy to add security checks. However, it does not
enforce isolation. To finish sandboxing your library, we will need to:
-
Update the application
main.cpp
to use thewasm2c
sandbox backend instead ofnoop
. -
Compile our library e.g.
mylib.c
to wasm i.e.mylib.wasm
- adding isolation to your library. Compile that resultingmylib.wasm
file to C (mylib.wasm.c
andmylib.wasm.h
) with thewasm2c
compiler - allow it to be compiled and linked with our application.
We will look at each these steps next and end with instructions on how you can try this out.
Modifying the application to use the wasm2c RLBox plugin/backend
Making this change is very simple with RLBox. In fact, it can be done
exclusively in the boilerplate. Here is the boilerplate to use the wasm2c
backend.
// We're going to use RLBox in a single-threaded environment.
#define RLBOX_SINGLE_THREADED_INVOCATIONS
// The fixed configuration line we need to use for the wasm2c sandbox.
// It specifies that all calls into the sandbox are resolved statically.
#define RLBOX_USE_STATIC_CALLS() rlbox_wasm2c_sandbox_lookup_symbol
// The rlbox wasm2c plugin requires that you provide the wasm2c module's name
#define RLBOX_WASM2C_MODULE_NAME mylib
// Include the produced header from wasm2c
#include "mylib.wasm.h"
#include "rlbox.hpp"
#include "rlbox_wasm2c_sandbox.hpp"
using namespace rlbox;
// Define base type for mylib using the noop sandbox
RLBOX_DEFINE_BASE_TYPES_FOR(mylib, wasm2c);
You'll probably notice that there are only a handful of changes.
- We now use
"rlbox_wasm2c_sandbox.hpp
instead ofrlbox_noop_sandbox.hpp
- The sandbox type which is the second parameter to the macro
RLBOX_DEFINE_BASE_TYPES_FOR
has now changed towasm2c
fromnoop
- The boilerplate for
RLBOX_USE_STATIC_CALLS
has changed to use the wasm2c backend's boilerplate - The wasm2c backend/plugin requires an extra piece of boilerplate which is the name of the wasm module as specified in the macro
RLBOX_WASM2C_MODULE_NAME
- The wasm2c backend/plugin requires the produces
mylib.wasm.h
(we'll discuss how to produce this in the next section), to be included in the file
These are mostly mechanical changes and are straightforward. Modifying the build is perhaps slightly more challenging as building wasm libraries involves multiple steps.
Modifying the build to produce to wasm sandboxed library
To show how we will update the build, we will use two CMakeLists.txt files as a reference
- The CMakeLists used by the noop sandbox
- The CMakeLists used by the wasm2c sandbox
As you can see the wasm CMakeLists is quite a bit longer. Below, we will give a high-level overview of the steps, so you can follow what is happening in the Wasm build.
To build and use the Wasm sandboxed library, we need several additional repos/tools
- We will need the rlbox wasm2c plugin/backend
- We will need a version of clang that can produce Wasm files, specifically, Wasm files that target WebAssembly System Interface (WASI). WASI is a group of standards-track API specifications designed to provide a secure standard interface for Wasm applications. Specifically, you need WASI if you want to use printf, timers, anything that makes a syscall. We will thus rely on the wasi-sdk, which provides wasi-clang, a version of clang that can target WASI, and wasi-libc (a custom version of musl libc modified for this use case).
- Finally, we will use the wasm2c Wasm
compiler. Wasm files need to be compiled into native libraries that can be
linked in your application. Unlike regular native libraries however, these
libraries are produces by sandboxed compiler is guaranteed to be sandboxed.
The
wasm2c
compiler in particular compiles Wasm files by first transpiling it to C (this produced C is basically machine code with a lot of sandboxing checks, and is not going to be readable), and then compiling the resulting C with a regular C compiler to produce native objects.
After we download these repos, we can then take the following steps
-
Build the wasm2c sandbox compiler and runtime. This is a project that can be built using CMake. You can read more about how to build wasm2c in their readme
-
We need to compile our
mylib.c
tomylib.wasm
using wasi-clang. The command in theCMakeLists.txt
that does this is${wasiclang_SOURCE_DIR}/bin/clang --sysroot ${wasiclang_SOURCE_DIR}/share/wasi-sysroot/ -O3 -Wl,--export-all -Wl,--no-entry -Wl,--growable-table -Wl,--stack-first -Wl,-z,stack-size=1048576 -Wl,--export-table -o ${MYLIB_WASM} ${C_DUMMY_MAIN} ${CMAKE_SOURCE_DIR}/mylib.c
There are a number of flags that start with
-Wl
that must be specified so we produce a Wasm file with the properties we'd expect. You can read more about these flags in the Wasm lld docs page. The output file${MYLIB_WASM}
corresponds tomylib.wasm
and the input C files aremylib.c
and${C_DUMMY_MAIN}
.${C_DUMMY_MAIN}
as the name indicates is an empty main function seen here. You could avoid this dummy main by using Wasm's reactor flag. -
Next we need to run
wasm2c
to transpile our wasm file back to a C file with checks. This is fairly straightforward, and you can read more about it in the wasm2c documentation -
We now have to compile the transpiled wasm file. The process for doing this is described in detail in the wasm2c repo. Broadly, we need to compile the transpiled files with the wasm2c runtime and appropriate includes. This will now generate our native sandboxed library
mylib
-
We can now build our application using
mylib
Building and running the wasm2c backend
To build this example on your machine, run the following commands
cd rlbox-book/src/examples/wasm-hello-example
cmake -S . -B ./build -DCMAKE_BUILD_TYPE=Debug
cmake --build ./build --config Debug --parallel
Then run it to make sure everything is working.
./build/main
You should see the following output:
Hello from mylib
Got array value 7
echo: hi hi!
hello_cb: hi again!
Advanced RLBox topics
In this chapter, we will go beyond the basic tutorial to understand some more advanced patterns you will encounter when using RLBox.
Tainted computations
RLBox does not permit some operations on tainted values for safety reasons
- Using a
tainted<unsigned int>
to index an array in the host, i.e.,int[]
. This can lead to an out of bounds array access if allowed. - Branching on a tainted value, i.e.,
if
conditions with tainted values in the condition,for
loops with tainted values in the condition. The only exception here is comparing a tainted pointer tonullptr
. So assumingret
is atainted<int>
, code of the formif (ret == 4) {...}
would not be allowed. This is because any scenario where an application's control flow is determined by a tainted value is inevitably going to result in security issues. Thus, RLBox returnstainted<bool>
during comparisons likeret == 4
which do not allow branching or loops. You can, of course, try to verify thetainted<bool>
usingcopy_and_verify
, or better, verifyret
prior to comparison to avoid this issue. - Any form of comparison on the result of dereferencing a tainted pointer.
Concretely, code of the form
would not compile. This is because dereferencing a tainted pointer refers to a location in the sandbox's memory, and the sandbox could change that at any moment. To represent the volatility of this comparison, RLBox returns a new typetainted<int*> a = ...; tainted<bool> b = *(*a == 4)*;
tainted_bool_hint
with the result of this comparison.
Untainting different types
As you sandbox more library APIs, you will soon have to start verifying (removing the tainting) objects of different types. Below, we'll go through the APIs used to remove tainting in different scenarios.
In all examples, there are a few things to keep in mind:
-
We will use a comment
<insert security checks here>
to refer that you need to apply domain specific checking to ensure that you only permit values that do not cause memory safety problems in the rest of your program. The tutorial goes through an example of adding security checks in more detail. -
Note that all verifier can return values of any type or no value at all. This gives some flexibility to how you want to manage your data. So for example, you may be untainting an
int
, but can return anunsigned int
or anint*
, orvoid
etc. -
We will refer to the term fundamental types frequently. C/C++ defines the fundamental types as the types built-in to the language, such as
int
,float
etc.
Untainting fundamental types
These types can be untainted with a verifier of the following form
tainted<int> a = ... ; //
int a_verified = a.copy_and_verify([](int val){
// <insert security checks here>
return val;
});
There maybe some scenarios where the application can handle any possible integer
from the sandbox, i.e., the <insert security checks here>
can be left empty.
- An example of this could be when you are calling an sandboxed library function that returns an integer 0 on success and a non-zero error code otherwise. From the host application perspective, you may do something if the returned value is 0, and exit otherwise.
In this scenario, RLBox provides a shorthand API called
unverified_safe_because
which can be used as follows.
tainted<int> a = ... ; //
int a_verified = a.unverified_safe_because("Error code. App is robust to all values of error code");
The unverified_safe_because
API takes a string argument that allows the
developer to document why they are doing this and why its safe. This string does
not have any special meaning in the code. Rather the RLBox API asks you to
provide a free-form string that acts as documentation. Essentially you are
providing a string that says it is safe to remove the tainting from this type
because... . Such documentation may be useful to other developers who read your
code. The above is equivalent to:
tainted<int> a = ... ; //
int a_verified = a.copy_and_verify([](int val){
// Error code. App is robust to all values of error code
return val;
});
Untainting byte buffers
Sometimes the sandbox returns a buffer that you want to untaint. For example,
you may use a sandboxed XML parse to parse jpeg images. In this example, after
the sandboxed library parses th image, it will produce a byte buffer with image
pixels, probably of the type tainted<char*>
or some other tainted pointer to a
fundamental type.
These types can be untainted with a verifier of the following form
tainted<int*> a = ... ; //
std::unique_ptr<int*> a_verified = a.copy_and_verify_range([&](std::unique_ptr<int*> val){
// <insert security checks here>
return val;
}, size);
This API copies size
bytes out of the sandbox and applies the necessary
checks like ensuring the entire buffer is coming from the sandbox memory.
In the example of a sandboxed image decoder, the buffer may hold completely random data instead of pixels of the image. It is up to you to figure out what your application context is and what security checks need to be in place. In the example of a sandboxed image decoder, the usual expectation is that there is no sensible way to check that decoding has occurred correctly, and rather the rest of your program should be robust to showing an incorrect image on the screen. In this case, there would be no security check in place.
Untainting byte buffers without copying
There maybe some scenarios where the application can handle any possible byte
buffer from the sandbox, i.e., the <insert security checks here>
can be left
empty.
- An example of this, would be if the image being displayed to the app in our sandboxed libjpeg example is say an application background and may not really matter.
In this case, we may want to use the byte before without making a copy, to avoid overheads.
RLBox provides aan API for this called unverified_safe_pointer_because
which
can be used as follows.
tainted<char*> a = ...;
char* raw = a.unverified_safe_pointer_because(10, "Demo of a raw pointer");
unverified_safe_pointer_because
takes two parameters. The first is the number
of bytes in this pointer that you will be accessing. RLBox needs this to ensure
that these many bytes of the pointer stay within the sandbox boundary. The
second is a string, that allows the developer to document why they are doing
this and why its safe. This string does not have any special meaning in the
code. Rather the RLBox API asks you to provide a free-form string that acts as
documentation. Essentially you are providing a string that says it is safe to
remove the tainting from this type because... . Such documentation may be
useful to other developers who read your code.
Untainting C-strings
Untainting C-strings of type tainted<char*>
is covered in the
tutorial.
These types can be untainted with a verifier of the following form
tainted<char*> str = ...;
std::unique_ptr<char[]> checked_string =
str.copy_and_verify_string([](std::unique_ptr<char[]> val) {
// <insert security checks here>
return move(val);
});
The API ensures that the tainted string lives within the sandbox and is null terminated, and makes a copy of the string that you can use.
A useful check in <insert security checks here>
is also to limit the size of
the string you want to allow.
Untainting one-level pointers to fundamental types
If you have a tainted pointer to a fundamental type such as tainted<int*>
,
tainted<float*>
etc., these types can be untainted with a verifier of the
following form
tainted<int*> a = ... ; //
std::unique_ptr<int> a_verified = a.copy_and_verify([](std::unique_ptr<int> val){
// <insert security checks here>
return val;
});
The idea here is that RLBox is effectively creating a deep clone of the object after doing the required checks of ensuring the pointer is in the sandbox. We would ideally allow this API for more types, but C++ makes it hard to know when we can reasonably perform a deep clone of an object, and hence this API is limited to tainted pointers to fundamental types.
This API is also limited to one-level pointers, i.e., things like
tainted<int*>
and is not allowed for tainted<int**>
.
Untainting just the "address bits" of a pointer
Your application may sometimes need just the raw bits of a tainted pointer without needing to look at the data being pointed to. An example of this would be if you want to maintain a hashmap of pointers in the class, but the pointers are produced by the sandbox.
tainted<int*> foo = sandbox.invoke_sandbox_function(...);
std::map<int*, int> my_map;
my_map[foo] = 3; // RLBox gives a compiler error
For this scenario, RLBox provides an API called copy_and_verify_address
which
takes a verifier that accepts a uintptr_t
. This API can be used as follows.
tainted<int*> foo = sandbox.invoke_sandbox_function(...);
std::map<int*, int> my_map;
uintptr_t foo_verified = foo.copy_and_verify_address([&](uintptr_t addr) {
// <insert security checks here>
return addr;
});
my_map[foo_verified] = 3;
Untainting C-arrays of fundamental types
Untainting C-arrays of fundamental types like tainted<int[3]>
is possible
using copy_and_verify
. Note, however, that the API expects the verifier to
accept an argument of std::array
, so that the data is correctly copied.
These types can be untainted with a verifier of the following form
tainted<int[3]> a = ... ; //
std::array<int, 3> a_verified = a.copy_and_verify([](std::array<int, 3> val){
// <insert security checks here>
return val;
});
What happens if we can't figure out a verifier?
In larger codebases, it may not be easy to find a suitable verifier for ret
.
This maybe because ret
is used in a lot of places and it is difficult to
figure out all the locations where it is used. Broadly, there are a few
different strategies to deal with this that we describe below. Ultimately, you
probably want to use a mix of these strategies to make this a tractable problem
Strategy 1: Defer verification
The first strategy to simplify identification of verifiers is to defer it. In fact, as a general rule you should do this wherever possible. This is because the more you defer verification, the further into your program you move the verifier. And the further your program, it is usually much clearer what a tainted value is used for, and thus easier to write the verifier.
To make this easier, RLBox allows a number of operations on tainted values directly (specifically, in scenarios where RLBox can ensure their safety).
For example, if you add a line in the application
auto ret_plus_1 = ret + 1;
and attempt to compile your code, you will see that the compiler does not report
any error on this line. This is because RLBox permits this operation and
ret_plus_1
is now a tainted_mylib<unsigned int>
, i.e., RLBox has propagated
the tainting, a "tainted computation".
Indeed there are a number of operations that are supported as "tainted computations", and produce a new tainted value. As a few examples:
- Arithmetic on a tainted value.
- Dereferencing a tainted pointer (a tainted pointer always points to memory in the sandbox. RLBox automatically checks this prior to the dereference to ensure safety).
- Comparing a tainted pointer to
nullptr
- Using a
tainted<unsigned int>
to index atainted<int[]>
There are also operations that are not allowed for safety reasons
- Using a
tainted<unsigned int>
to index an array in the host, i.e.,int[]
. This can lead to an out of bounds array access if allowed. - Branching on a tainted value, i.e.,
if
conditions with tainted values in the condition,for
loops with tainted values in the condition.
You can learn more about this in the advanced topics chapter
Trying to figure out what operations are allowed or not may seem tricky, but there is a straightforward approach. Try the operation! If RLBox doesn't throw a compilation error, you can be assured it is safe and you can defer this.
Strategy 2: Verification for local use
Another option is to simply verify a tainted variable for one use case at a time. Rather than verifying the tainted value for the rest of the program, verify it for the next use case only, and do not remove the tainting.
Strategy 3: Enforce the library contract
Finally, in scenarios where a library's security contract is clearly defined for an output, you could use this a verifier for the tainted data as soon as it is returned by a sandboxed function.
Getting raw pointers into the sandbox memory
Typically, RLBox does not let you create raw pointers into sandbox memory, i.e.,
pointers of the form char*
. Rather the pointers will be wrapped as
tainted<char*>
. However, there maybe certain scenarios where you really need a
raw pointer into sandbox memory.
You can do this with the unverified_safe_pointer_because
API. This converts a
tainted
pointer to a raw pointer with only minimal verification of checking
that the pointer is within the sandbox boundary.
The details of how to use this API are provided here.
Miscellaneous troubleshooting
Assigning 0 or NULL to tainted pointers is not supported
Unfortunately, NULL
in C++ is types as int and this makes it indistinguishable
from any other integer. So, RLBox does not allow zeroing out pointers with 0
or NULL
. You can, however, pass NULL
using the C++ nullptr
keyword.
I cannot call copy_and_verify
on tainted<void*>
RLBox does not allow copy_and_verify
on tainted<void*>
as it could lead to
some anti-patterns in verifiers. Cast it to a different tainted pointer with
sandbox_reinterpret_cast
and then call copy_and_verify
. Alternately, you can
use the UNSAFE_unverified
API to do this without casting.
Alternate isolation backends
WebAssembly with wasm2c is just one way to isolate running code. We have focussed on this so far as this is the approach that is being used in production in Firefox today. However, RLBox's plugin model is completely general, and can be configured to support other isolation backends as well.
Note that the below plugins are experimental and are not actively maintained (which may mean they have compilation bugs that you'd have to fix). This is meant purely as a reference for you to write your own plugins.
Experimental/previously-used RLBox plugins for isolation
-
Using LFI (See this chapter for more details)
https://github.com/UT-Security/rlbox_lfi_sandbox -
Using WebAssembly through Lucet
https://github.com/PLSysSec/rlbox_lucet_sandbox -
Using WebAssembly through Wamr
https://github.com/PLSysSec/rlbox_wamr_sandbox -
Using WebAssembly through Wasmtime
https://github.com/PLSysSec/rlbox_wasmtime_sandbox -
Using Google's Native Client
https://github.com/PLSysSec/rlbox_nacl_sandbox
Enforcing isolation with the LFI sandbox
Note: this section is written as a continuation of the tutorial as an alternate to using RLBox with wasm2c. It's best if you follow the tutorial upto that point and continue reading below.
The noop
backend makes it easy to add security checks. However, it does not
enforce isolation. To finish sandboxing your library, we will need to:
-
Update the application
main.cpp
to use thelfi
sandbox backend instead ofnoop
. -
Compile our library e.g.
mylib.c
using the LFI compiler to native sandboxed object, and link this into our application.
We will look at each these steps next and end with instructions on how you can try this out.
Modifying the application to use the lfi RLBox plugin/backend
Making this change is very simple with RLBox. In fact, it can be done
exclusively in the boilerplate. Here is the boilerplate to use the lfi
backend.
// We're going to use RLBox in a single-threaded environment.
#define RLBOX_SINGLE_THREADED_INVOCATIONS
#include "rlbox.hpp"
#include "rlbox_lfi_sandbox.hpp"
using namespace rlbox;
extern "C" {
extern uint8_t mylib_start[];
extern uint8_t mylib_end[];
};
// Define base type for mylib using the noop sandbox
RLBOX_DEFINE_BASE_TYPES_FOR(mylib, lfi);
You'll probably notice that there are only a handful of changes.
- We now use
"rlbox_lfi_sandbox.hpp
instead ofrlbox_noop_sandbox.hpp
- The sandbox type which is the second parameter to the macro
RLBOX_DEFINE_BASE_TYPES_FOR
has now changed tolfi
fromnoop
- The boilerplate for
RLBOX_USE_STATIC_CALLS
has been removed aslfi
backend doesn't need this, unlike say thenoop
backend - The lfi backend/plugin requires an extra piece of boilerplate which is to
define the beginning and end of the lfi library that has been embedded into
the application as a datablob. This is shown as
mylib_start
andmylib_end
here. We will discuss how to generate these variables in the next section.
These are mostly mechanical changes and are straightforward. Modifying the build is perhaps slightly more challenging as building lfi libraries involves multiple steps.
Modifying the build to produce to lfi sandboxed library
To show how we will update the build, we will use two CMakeLists.txt files as a reference
- The CMakeLists used by the noop sandbox
- The CMakeLists used by the lfi sandbox
As you can see the lfi CMakeLists is quite a bit longer. Below, we will give a high-level overview of the steps, so you can follow what is happening in the LFI build.
To build and use the LFI sandboxed library, we need several additional repos/tools
- We will need the rlbox lfi plugin/backend
- We will need a version of clang that can produce LFI files. We will used the prebuilt binaries of LFI clang available in the official repo which brings the compiler, the custom libc and so on.
- Finally, we need the lfi-runtime, which supports the loading and execution of LFI runtimes.
After we download these repos, we can then take the following steps
-
Build the lfi runtime. This is a project that can be built using meson. You can read more about how to build lfi in their readme
-
We need to compile our
mylib.c
tomylib.wasm
using lfi's clang. The command in theCMakeLists.txt
that does this isPATH=$ENV{PATH}:${lficlang_SOURCE_DIR}/lfi-bin ${lficlang_SOURCE_DIR}/bin/clang ${LFI_SBX_BUILD_TYPE_FLAGS} -Wl,--export-dynamic -static-pie -o ${MYLIB_ELF} ${CMAKE_SOURCE_DIR}/mylib.c -L ${LFI_SYSROOT_PATH} -lboxrt
This commands starts by adding the lfi clang folder to the $PATH. ${LFI_SBX_BUILD_TYPE_FLAGS} is just going to be
-O0
or-O3
. LFI requires that the-Wl,--export-dynamic
and-static-pie
flags are present in the compilation so that the produced code is position independent executable that has as a symbol table. Finally the produced elf is linked with LFI's in-sandbox runtimelibboxrt
. -
Next we will create a simple assembly file that just embeds the produced lfi binary in a datablob as part of the application. This can be done easily with a file like this which includes the binary as a datablob between two symbols
${INCSTUB_FILENAME}_start
and${INCSTUB_FILENAME}_end
. We define${INCSTUB_FILENAME}
to bemylib
in our example, so we get the requiredmylib_start
andmylib_end
symbols. -
Finally, we can now build our application, and including the file with the datablob, to embed the lfi-sandboxed library as part of the application. We also have link in the lfi-runtime's liblfi which is needed to instantiate and destroy sandboxes.
Building and running the lfi backend
To build this example on your machine, run the following commands
cd rlbox-book/src/examples/lfi-hello-example
cmake -S . -B ./build -DCMAKE_BUILD_TYPE=Debug
cmake --build ./build --config Debug --parallel
Then run it to make sure everything is working.
./build/main
You should see the following output:
Hello from mylib
Got array value 7
echo: hi hi!
hello_cb: hi again!
Additional content
This section contains additional resources such as useful recipes, tutorials which may have additional content (but may be slightly out of date) etc.
Modifying a Makefile based project to compile to Wasm
This tutorial assumes you have the rlbox_wasm2c_sandbox git repo is in the path $(RLBOX_WASM2C_PATH), and you have installed wasi-sdk on your computer in the path $(WASI_SDK_PATH)
Build the sources of your library along with the file
$(RLBOX_WASM2C_PATH)/c_src/wasm2c_sandbox_wrapper.c
. Pass the flags -Wl,--export-all -Wl,--stack-first -Wl,-z,stack-size=262144 -Wl,--no-entry -Wl,--growable-table -Wl,--import-memory -Wl,--import-table
to the linker using the wasi-clang compiler. This will produce a wasm module.
To edit an existing Make based build system, you can run the commmand.
$(WASI_SDK_PATH)/bin/clang --sysroot $(WASI_SDK_PATH)/share/wasi-sysroot $(RLBOX_WASM2C_PATH)/c_src/wasm2c_sandbox_wrapper.c -c -o $(RLBOX_WASM2C_PATH)/c_src/wasm2c_sandbox_wrapper.o
AR=$(WASI_SDK_PATH)/bin/ar \
CC=$(WASI_SDK_PATH)/bin/clang \
CXX=$(WASI_SDK_PATH)/bin/clang++ \
CFLAGS="--sysroot $(WASI_SDK_PATH)/share/wasi-sysroot" \
LD=$(WASI_SDK_PATH)/bin/wasm-ld \
LDLIBS=$(RLBOX_WASM2C_PATH)/c_src/wasm2c_sandbox_wrapper.o \
LDFLAGS="-Wl,--export-all -Wl,--stack-first -Wl,-z,stack-size=262144 -Wl,--no-entry -Wl,--growable-table -Wl,--import-memory -Wl,--import-table" \
make
Using a Wasm module with imported memory and imported tables
By default, RLBox operates on Wasm modules with an imported memory and imported table. This allows RLBox to optimize the memory allocation's alignment for its internal operations. However, you can also use modules with exported memory and exported tables (albeit with some performance penalty).
To do this, adjust the flags during the compilation of your Wasm module to
export memory and tables. Specifically, remove the arguments
-Wl,--import-memory -Wl,--import-table
if present, and use the arguments
-Wl,--export-all -Wl,--export-table
Additional material
-
The best example of how to use RLBox is to see its use in Firefox. The Firefox code search is a great way to do this.
-
Working through the simple library example repo is a good way to get a feel for retrofitting a simple application that uses a potentially buggy library is a good next. The solution is available in the solution folder in the same repo.
-
Documentation of the core RLBox APIs.
-
Short tutorial on using the RLBox APIs. Note that this tutorial uses the old Lucet Wasm compiler.
-
The RLBox test suite itself has a number of examples.
-
Finally, the original academic paper explaining RLBox and its use in Firefox RLBoxPaper at the USENIX Security 2020 and the accompanying video explanations are a good way to get an overview of RLBox.