Retrofitting isolation in a simple application
For our tutorial, we're going to be sandboxing a tiny library mylib
. While
this library is very simple, it exercises key features of RLBox including:
calling functions, copying strings into the sandbox, registering and handling
callbacks from the library.
This first part of the tutorial is going to focus on modify our application to add sandboxing, the next part will focus on recompiling our library with wasm to enforce isolation.
In this example, we're going to use the noop sandbox backend. The noop sandbox does not actually enforce isolation, it is simply a tool that makes it easier to port new libraries to RLBox. The noop sandbox does nothing more than turn our calls into the RLBox sandbox into normal function calls to the library we already have linked in our application.
The reason for this noop
backend is that it supports incrementally porting our
application. Instead of having to worry about trying to change all our library
interfaces at once (to account for ABI differences between a sandbox and our
normal library), and deal with the resulting head-aches. We can change gradually
change our function calls from normal library calls, to sandbox calls, and at
each step test that our application continues to work as expected.
Our example library
mylib
declares four functions in mylib.h:
#pragma once
#ifdef __cplusplus
extern "C" {
#endif
void hello();
unsigned add(unsigned, unsigned);
void echo(const char* str);
void call_cb(void (*cb) (const char* str));
#ifdef __cplusplus
}
#endif
And implements those function in mylib.c:
#include <stdio.h>
#include "mylib.h"
void hello() {
printf("Hello from mylib\n");
}
unsigned add(unsigned a, unsigned b) {
return a + b;
}
void echo(const char* str) {
printf("echo: %s\n", str);
}
void call_cb(void (*cb) (const char* str)) {
cb("hi again!");
}
Boilerplate
To get started, in our main application (main.cpp) let's first import the RLBox library and implement some necessary boilerplate:
// We're going to use RLBox in a single-threaded environment.
#define RLBOX_SINGLE_THREADED_INVOCATIONS
// All calls into the sandbox are resolved statically.
#define RLBOX_USE_STATIC_CALLS() rlbox_noop_sandbox_lookup_symbol
#include <stdio.h>
#include <cassert>
#include <rlbox/rlbox.hpp>
#include <rlbox/rlbox_noop_sandbox.hpp>
#define release_assert(cond, msg) if (!(cond)) { fputs(msg, stderr); abort(); }
#include "mylib.h"
using namespace std;
using namespace rlbox;
// Define base type for mylib using the noop sandbox
RLBOX_DEFINE_BASE_TYPES_FOR(mylib, noop);
// Declare callback function we're going to call from sandboxed code.
void hello_cb(rlbox_sandbox_mylib& _, tainted_mylib<const char*> str);
int main(int argc, char const *argv[]) {
// ... will fill in shortly ...
// destroy sandbox
sandbox.destroy_sandbox();
return 0;
}
Why the boilerplate? RLBox has support for different kinds of sandboxing
back-ends. In practice we start with the noop sandbox, which is not a real
sandbox, to get our types right and only at the end change from noop
to a
real sandbox like Wasm. This, alas, means the RLBox types are
typically generic in the sandbox type (e.g., rlbox::tainted<T, sandbox_type>
); macros like RLBOX_DEFINE_BASE_TYPES_FOR
define simpler types
for us (e.g., we can use tainted_mylib<T>
). In this simple example we only
use the noop sandbox; we walk through how you modify this code to use Wasm in
the next chapter.
Creating sandboxes and calling sandboxed functions
Now that the boilerplate is out of the way, let's create a new sandbox and
call the hello
function:
// Declare and create a new sandbox
rlbox_sandbox_mylib sandbox;
sandbox.create_sandbox();
// Call the library hello function:
sandbox.invoke_sandbox_function(hello);
We do not call hello()
directly. Instead, we use the
invoke_sandbox_function()
method. Once we turn on sandboxing, i.e., switch
from the noop sandbox to Wasm, we won't be able to call the function directly
either (e.g., because Wasm's
ABI might be
different from the app).
Calling sandboxed functions and verifying their return value
Let's now call the add
function:
// call the add function and check the result:
auto ok = sandbox.invoke_sandbox_function(add, 3, 4)
.copy_and_verify([](unsigned ret){
printf("Adding... 3+4 = %d\n", ret);
return ret == 7;
});
printf("OK? = %d\n", ok);
This call is a bit more interesting. First, we call add
with arguments. Since
these arguments are primitive types RLBox doesn't impose any restrictions.
Second, RLBox ensures that the unsigned
return value that add
returns is
tainted and thus cannot be used without verification. For example,
Here, we call the
copy_and_verify()
method which copies the value into application memory and
runs our verifier function:
[](unsigned ret){
printf("Adding... 3+4 = %d\n", ret);
return ret == 7;
}
This function (lambda) simply prints the tainted value and returns true
if it
is 7
. A compromised library could return any value and if we use this value
to, say, index an array this could potentially introduce an out-of-bounds
memory access.
Calling functions with (tainted) strings
Let's now call the echo
function which takes a slightly more interesting
argument: a string. Here, we can't simply pass a string literal as an argument:
the sandbox cannot access application memory where this would be allocated.
Instead, we must allocate a buffer in sandbox memory and copy the string we
want to pass to echo
into this region:
// Call the library echo function
const char* helloStr = "hi hi!";
size_t helloSize = strlen(helloStr) + 1;
tainted_mylib<char*> taintedStr = sandbox.malloc_in_sandbox<char>(helloSize);
strncpy(taintedStr
.unverified_safe_pointer_because(helloSize, "writing to region")
, helloStr, helloSize);
Here taintedStr
is a tainted string: it lives in the sandbox memory and could
be written to by the (compromised) library code concurrently. In general, it's
unsafe for us to use tainted data without verification since it could be
attacker controlled. In this particular case, though, we just want to copy data
(helloStr
specifically) to taintedStr
. We do this by using the
unverified_safe_pointer_because
to essentially cast taintedStr
to a char*
without any verification. This is safe because we are just copying
helloStr
to sandbox memory: at worst, the sandboxed library can overwrite the
memory region pointed to by taintedStr
and crash when it tries to print
it.1
Note: Internally,
unverified_safe_pointer_because
is not actual just a cast. It also ensures (1) that the pointer is within the sandbox and that (2) accessinghelloSize
bytes off the pointer would stay within the sandbox boundary.
It's worth mentioning that the string "writing to region"
does not have any
special meaning in the code. Rather the RLBox API asks you to provide a
free-form string that acts as documentation. Essentially you are providing a
string that says it is safe to remove the tainting from this type because...
. Such documentation may be useful to other developers who read your code. In
the above example, a write to the sandbox region cannot cause a memory safety
error in the application so it's safe to remove the taint.
Now, we can just call the function and free the allocated string:
sandbox.invoke_sandbox_function(echo, taintedStr);
sandbox.free_in_sandbox(taintedStr);
Sneak peak of upcoming feature: In an upcoming version of RLBox transferring a buffer into the sandbox will much simpler with a new TransferBuffer abstraction. To get a sneak preview of this, take a look at the usage in Firefox.
Registering and handling callbacks
Finally, let's call the call_cb
function. To do this, let's first define a
callback for the function to call. We declared our callback in the boilerplate, but never defined the function. So let's do that at the end of the file:
void hello_cb(rlbox_sandbox_mylib& _, tainted_mylib<const char*> str) {
auto checked_string =
str.copy_and_verify_string([](unique_ptr<char[]> val) {
release_assert(val != nullptr && strlen(val.get()) < 1024, "val is null or greater than 1024\n");
return move(val);
});
printf("hello_cb: %s\n", checked_string.get());
}
This callback is called with a tainted string. To actually use the tainted
string we need to verify it. To do this, we use the string verification function
copy_and_verify_string()
with a simple verifier:
str.copy_and_verify_string([](unique_ptr<char[]> val) {
release_assert(val != nullptr && strlen(val.get()) < 1024, "val is null or greater than 1024\n");
return move(val);
});
This verifier moves the string if it is not null and if its length is less than 1KB. In the callback we simply print this string.
Let's now continue back in main
. To call_cb
with the callback we first
need to register the callback - otherwise RLBox will disallow the
library-application call - and pass the callback to the call_cb
function:
// register callback and call it
auto cb = sandbox.register_callback(hello_cb);
sandbox.invoke_sandbox_function(call_cb, cb);
Build and run
If you haven't installed RLBox, see the Install chapter.
Clone this books' repository:
git clone https://github.com/PLSysSec/rlbox-book
cd rlbox-book/src/chapters/examples/noop-hello-example
Build:
cmake -S . -B ./build -DCMAKE_BUILD_TYPE=Release
cmake --build ./build --config Release --parallel
Run:
$ ./build/main
Hello from mylib
Adding... 3+4 = 7
OK? = 1
echo: hi hi!
hello_cb: hi again!
For single threaded applications the attacker can't overwrite the pointer because we're not calling into the sandbox before calling `strncpy.