Application sandboxes are getting more and more popular.  There are multiple schools and implementations.  Let’s see how to use the FreeBSD’s Capsicum.


Capsicum was integrated into base and first released in FreeBSD 9.0. Some initial work was also done by Google to port it to the Linux kernel.  It was developed at the Cambridge University by a team led by Robert Watson.  It’s a lightweight sandboxing framework, which can limit the application’s capabilities. It is based on file descriptors, like the standard file system objects or sockets and it also provides notion of processes as file descriptors using pdfork(2) syscall. 

Application has to be rewritten to make use of the framework, but the API is very easy to use. If your application requires multiple non-compatible libraries, there is no simple way to use it, without rewriting all of them. Even the libc will not help you with this task.  The idea behind this, is to limit Trusted Computing Base to the smallest possible.  Once the application calls cap_enter(2), the capabilities framework starts the enforcement.  You should call it as early as possible, making sure to open all required descriptors or marking some with ability to create a new one, i.e. with CAP_ACCEPT for sockets – that’ll create client sockets with limited sets of capabilities.  All limits persist after fork and/or exec, although you need to rewrite the application to use pdfork(2) and fexecve(2).

Apart from the standard operations like read(2) and write(2), you can also limit subsets of ioctl(2) and fcntl(2) commands that’ll be allowed in the sandbox.  If you enable the use of CAP_IOCTL on a descriptor, but don’t limit its commands, any ioctl request can be made using this fd, which exposes a vulnerability.

The basic operations are cap_enter(2), that puts the app in the sandbox and cap_rights* functions such as cap_rights_init(3) that initialises cap_rights_t structure with requested capabilities and cap_rights_limit(2) that apply them to the fd.  To limit ioctls or fcntls, use cap_ioctls_limit(2) and cap_fcntls_limit(2).

In the example below, we limit capabilities of stdin, allowing it only read(2) and a subset of ioctl(2) and fcntl(2), required to get terminal capabilities and window size:

cap_rights_t rights;
// ioctl commands we want to allow
unsigned long cmds[] = { TIOCGETA, TIOCGWINSZ };

cap_rights_init(&rights, CAP_FCNTL, CAP_FSTAT, CAP_IOCTL, CAP_READ);

// limit base capabilities
cap_rights_limit(0, &rights);

// allow selected ioctls
cap_ioctls_limit(0, cmds, nitems(cmds));

// allow selected fcntls
cap_fcntls_limit(0, CAP_FCNTL_GETFL);

Calling any other ioctl(2) or fcntl(2) on that descriptor will return error ECAPMODE.

Let’s assume that the attacker got access to the application and can run a custom code, for example execute a shell.  Let’s see how to protect against it using capsicum:

// we're entering the sandbox
if (cap_enter() == -1) {
    err(1, "cap_enter()");

// [...]

// classic exec isn't possible in sandboxed app
if (execl("/bin/sh", "-", NULL) == -1) {
    err(1, "execl()");

The output:
cap_test: execl(): Not permitted in capability mode

In the next post, I’ll look into how to limit sockets.

You can follow us on Twitter – we always post an update when a new post is published.

Leave a comment