Spons & Shields: Practical Isolation For Trusted Execution: Vasily A. Sartakov Daniel O'Keeffe David Eyers
Spons & Shields: Practical Isolation For Trusted Execution: Vasily A. Sartakov Daniel O'Keeffe David Eyers
Spons & Shields: Practical Isolation For Trusted Execution: Vasily A. Sartakov Daniel O'Keeffe David Eyers
186
VEE ’21, April 16, 2021, Virtual, USA Vasily A. Sartakov, Daniel O’Keeffe, David Eyers, Lluís Vilanova, and Peter Pietzuch
187
Spons & Shields: Practical Isolation for Trusted Execution VEE ’21, April 16, 2021, Virtual, USA
188
VEE ’21, April 16, 2021, Virtual, USA Vasily A. Sartakov, Daniel O’Keeffe, David Eyers, Lluís Vilanova, and Peter Pietzuch
Table 1. Isolation approaches for trusted execution (7/3 indicates requirement partially satisfied).
(R1) Fine-grained (R2) Efficient (R3) POSIX (R4) TEE
Approach Description memory
isolation sharing compatibility implementable
ERIM [78] Process-level
Domains Shred [14] domains with 7/3 7 3 3
CubicleOS [67] isolated threads
well-known techniques [35, 41, 79, 86]. We also assume that from RPCs to message-passing), and the required encryption
the TEE OS and the compiler are implemented correctly, and can slow down data exchange by up to 10× [68]. A multi-
thus part of our runtime TCB. Our goal is to protect untrusted TEE design thus does not satisfy R2 or R3 due to the extra
application components (e.g., libraries or processes) from development effort and performance costs.
each other, and also protect the TEE OS from them. In Tab. 1, we compare other approaches for intra-process
Intra-TEE isolation approaches must satisfy requirements: isolation of userspace code (see §7 for more related work).
(R1) Fine-grained isolation. The approach should provide Intra-process domains. Different hardware mechanisms
primitives to compartmentalise the components of TEE ap- can be used to compartmentalise processes. Intel MPK [29]
plications [22]. Compartmentalisation should be applicable enforces domains inside a process by assigning tags to mem-
to both processes and libraries with little developer interven- ory pages. Similarly, tags can be assigned to threads, bind-
tion and low performance overhead. In addition, the library ing them within particular domains. Shred [14] (uses Arm
OS inside the TEE should be regarded as another (set of) domains that are similar to Intel MPK), ERIM [78] and Cubi-
software components, thus unifying user/kernel isolation. cleOS [67] introduce system abstractions for MPK, but only
(R2) Efficient memory sharing. Inter-component commu- a few (16) isolated contexts can be used, contradicting R3.
nication is performance critical for many applications. The libMPK [55] lifts this limitation by virtualising protection
approach should support efficient shared memory commu- keys but requires trusting the OS kernel, contradicting R4.
nication between a subset of application components when Hardware isolation extensions. Researchers have also
required, while isolating components from the rest of the proposed new hardware extensions for isolation. IMIX [18]
application and the TEE OS. introduces in-memory isolation for x86 that allow develop-
(R3) Compatible with existing POSIX applications. The ers to mark memory pages as security sensitive. CHERI [83]
approach should provide abstractions that are compatible and CODOMs [80] introduce hardware-supported capability
with existing applications built from processes, threads and systems, which support program compartmentalisation. All
types of IPC. The primitives should be available at runtime these approaches rely on hardware extensions unavailable
and not impose restrictions on the number of execution and on commodity TEE platforms, contradicting R4.
isolation units (e.g., threads, processes and compartments), Kernel-enforced isolation. The OS kernel can isolate parts
only subject to the available memory. of a process using its own primitives and the MMU. Light-
(R4) Implementable within TEEs. Compartmentalisation weight Contexts (LwCs) [42] are an abstraction for intra-
should be available inside TEEs and be compatible with their process isolation. Each LwC has its own heap and stack
security model. In particular, the soundness of compartments but can access only limited memory ranges. Switching be-
should not depend on support from the untrusted host or tween LwCs involves the OS kernel, as it changes virtual
access to hardware features not widely available on TEEs. memory mappings, file table entries and more. As with
other proposals such as SeCage [43] or Secure Memory
2.4 Existing Isolation Approaches Views [27], for some TEEs, this requires relying on a trusted
Since a single process can host more than one TEE, a straw- host OS/hypervisor, violating R4.
man solution for isolation is to spawn multiple TEEs [12, Compiler instrumentation. TEE code can be instrumented
68, 73]. A multi-TEE design, however, must use encrypted by the compiler to protect pointers. Using Intel MPX [61] or
communication between TEEs. Such communication may SGXBounds [38], developers add checks to all pointer-related
require partial redevelopment of an application (e.g., to move
189
Spons & Shields: Practical Isolation for Trusted Execution comp1 VEE ’21, April 16, 2021, Virtual, USA
LibOS
ShieldSSL functions specified by the developer (SSL_read in the ex-
Spon
Passive Spon OpenSSL (SponSSL)
Context
ample); active Spons (e.g., Sponweb in Fig. 3) are used to
Shield SSL_read(buf)
x
switch encapsulate POSIX processes (with their main entry point).
Shieldweb All user code is associated with a Spon, which serves both
as an execution context handle and as the minimum unit
*buf
of memory protection. Memory protection policies are ex-
Context switch
x
ple Spons can be assigned to the same Shield), and by defining
bar(...) *address
a hierarchical nested relationship between Shields. SSF then
x
Active Spon NGINX (SponWeb)
enforces the following invariant: Spon S is allowed to access
execve()
main(…)
x socket(…)
all the code and data of all Spons contained in the Shield
immediately enclosing S (recall that a Spon encapsulates the
Active Spon PHP (SponPHP) code, data, and heap associated with it).
TEE OS ShieldPHP TEE OS The TEE OS executes on a default, outermost Shield that
Figure 3. Anatomy of Spons and Shields is also assigned by default to all Spons that did not specify a
Shield. To make threading more efficient and secure against
operations. MPX uses hardware bounds registers to check certain attacks from the host, each Spon has a set of user-level
buffers; SGXBounds encodes buffer sizes into unused bits of threads multiplexed over host OS threads [4, 53, 84]. Invoking
SGX TEE pointers. Both can protect SGX TEE code but do not a function of another Spon through the TEE OS results in a
offer a programming abstraction for existing multi-process user-level context switch (see §4). Note that passive Spons
applications, contradicting R3. are called synchronously, but contain independent thread
Compiler instrumentation can also protect code at a coarser stacks to preserve isolation from their calling Spons.
granularity. ConfLLVM [8] creates two partitions inside a SSF instruments Spon code to check every memory access
process, one trusted and one untrusted. It instruments the lies within its assigned Shield. Spons cannot directly access
untrusted partition and guarantees that its code can never the TEE OS; instead, SSF deploys a callback table on each
reach the trusted one. It only supports two partitions, con- Spon with pointers to trusted trampolines into the TEE OS.
tradicting R3. Occlum [72] supports multi-process isolation Calls into the TEE OS thus have no context-switching over-
inside SGX TEEs using MPX. However, it does not support head, and the trusted TEE OS itself is not instrumented. Note
library isolation (R1) and only supports a very limited form that the TEE OS is hardened to prevent unauthorised cross-
of memory sharing between pairs of processes with overlap- Shield interactions (see §4.3), and our threat model relies on
ping memory ranges (R2). Occlum also relies on the host OS users deploying instrumented application code (see §2.3).
for scheduling and synchronisation, reducing performance Users can declare Spons and Shields when starting a TEE
through costly TEE transitions and making it possible to (by mapping program paths to Spons) to compartmentalise
introduce and exploit races in otherwise safe code (R4). unmodified applications (see §3.3). The same operations are
In summary, while many techniques have been proposed also available to dynamically create Spons and Shields inside
to isolate userspace components, they either require addi- a TEE, allowing more sophisticated use cases (see §3.2).
tional hardware, the involvement of the untrusted host OS
kernel, or do not offer a suitable programming abstraction. 3.2 Use Cases for Spon and Shield Separation
3 Spons and Shields The simplest use case for SSF is to have one Shield per Spon
– a symmetric configuration, which achieves the equivalent
This section introduces Spons and Shields, the core abstrac- of conventional process-based isolation. This configuration
tions in SSF to isolate execution in SGX TEEs. We give works between processes of the same program e.g., processes
an overview of Spons and Shields (§3.1), describe their us- isolating network connection handling [51, 76], or between
age (§3.2) and explain their API (§3.3). the web server and PHP processes in our healthcare scenario.
Conversely, SSF supports multiple processes sharing mem-
3.1 Overview
ory in the form of “one Shield for multiple Spons”. Shared
Fig. 3 shows a TEE with multiple Spons and Shields to pro- memory is not readily supported by existing single-TEE [72]
tect the NGINX web server and the OpenSSL library in our or multi-TEE solutions [12] when isolating processes from
example healthcare scenario. Spons encapsulate execution each other. For PostgreSQL with SSF (see §6.1), each DBMS
contexts: executable code with a set of known function entry process is placed on its own Spon but assigned to a sin-
points, per-thread contexts and stacks, and a heap allocator. gle Shield, allowing access to shared memory. This policy is
SSF supports two types of Spons: passive Spons (SponSSL ) weaker than process-based isolation (all DBMS processes can
encapsulate arbitrary code components with entry point access each other, not just the shared memory region), but
190
VEE ’21, April 16, 2021, Virtual, USA Vasily A. Sartakov, Daniel O’Keeffe, David Eyers, Lluís Vilanova, and Peter Pietzuch
191
Spons & Shields: Practical Isolation for Trusted Execution VEE ’21, April 16, 2021, Virtual, USA
Shield 1
Spon Binary (PostgreSQL#1) Spon 3 data/code
control flow integrity (CFI) [1, 77], and MPX hardware acceler-
spn-musl
Shield 0
Spon 2 stack ation [52, 61] to perform bounds checks on memory accesses.
Spon 2 heap
Spon Binary (PHP) The combination of SFI and CFI is aimed at thwarting ex-
Spon 2 data/code
spn-musl ternal attacks that try to bypass the isolation guarantees of
Spon 1 stack
SSF [65, 70], and supports unmodified application code.
Shield 2
Spon 1 heap
TEE OS interface Spon 1 data/code To protect Shield boundaries, SSF “flattens” their hierar-
Linux Kernel Library chical relationship, as defined through alloc_shield, into
Host Call Interface
TEE OS/App
memory ranges in the TEE’s virtual address space. Every
Enclave
data/code non-nested Shield simply gets consecutive memory ranges of
Figure 5. Memory layout of an application with SSF a known size, specified by argument size in alloc_shield,
whereas each nested Shield recursively gets a portion of the
Example of ahead-of-time configuration: Let us consider how memory range assigned to its immediately enclosing Shield.
to deploy the healthcare application scenario without code Note that users are allowed to deploy Spons in the “zero”
changes: (1) use alloc_shield to create three, non-nested Shield, in which case no instrumentation is necessary for
shields, Shieldweb , ShieldPHP , and ShieldDB ; (2) register the that code (i.e., it exists in the same Shield as the TEE OS).
Spons for all components with register_spon, assigning
Compiler-based bounds-checking. SSF provides a new
each to their own Shield; and (3) when the application starts,
compiler pass (SSFPass) that add bounds-checking instruc-
each call to execve triggers the creation of the necessary
tions to every memory access in a Spon; code accessing
active Spons and Shields, and invokes their main function.
memory outside the Shield will generate an exception.
4 SSF Implementation SSFPass reserves one MPX bounds register (BND0) that
contains the bounds of the currently active Shield, and inserts
Next we explain how Spons and Shields are implemented
the necessary upper and lower bounds check instructions
in SSF. We describe the memory layout (§4.1), how memory
(bndcl and bndcu) for every memory access in the Spon. The
accesses are constrained (§4.2), how the TEE OS interface
pass also verifies that instrumented code does not modify
is protected (§4.3), how to support multi-threaded execu-
the bounds register, which is only updated by the TEE OS
tion (§4.4) and how to deploy applications with Spons (§4.5).
when switching between Shields (via bndmk).
4.1 Memory Layout SSFPass is implemented using LLVM version 9.0 [39], with
In SSF, applications inside an SGX TEE consist of three roughly 300 lines of C++ code. The pass operates on the in-
parts (see §2.2): (i) the deployed application, (ii) a shared termediate representation (IR) of a Spon program linked
library with a standard C library interface (musl [50]), and with spn-musl (see §4.5) and does not require source code
(iii) the TEE OS kernel (LKL [60]). A TEE OS layer combines changes. It also supports assembly code if implemented as
these components and provides SGX-related functionality, C inlines, for which it adds argument protection. Note that
such as a host interface, user-level threading and locking. MPX support cannot be disabled by a malicious host OS
The SGX TEE starts executing an init program that is because it is controlled by the untrusted XCR0 register (man-
linked against all the TEE OS components, and is contained in aged by the TEE), and SGX ensures its integrity.
the outermost “zero” Shield, which has access to all the TEE To decrease instrumentation overhead, SSFPass elides
memory (never shown for brevity). Fig. 5 shows the memory bounds checks on addresses known to be safe; e.g., temporary
layout of the init program and the Spons and Shields that it stack variables or standard memory-related functions with
creates by sub-dividing the available TEE memory. Each Spon a known buffer size (such as memcopy, memset, or memcmp),
has its own text, data, and bss segments loaded into the which can be checked just once during their first access.
memory of its respective Shield, as well as an independent Further compiler optimisations that remove unnecessary
dynamic memory sub-allocator and per-thread stack. instrumentation could be added to SSFPass [72, 87].
To execute programs inside a Spon, we statically link them Enforcing control flow integrity. SSFPass leverages
with spn-musl, a libC-compatible library for Spons. SSF then LLVM’s fine-grained forward-edge CFI [77] to restrict indi-
instantiates a callback table for each Spon to redirect invoca- rect function calls. It enforces that function calls take place
tions to required TEE OS functions. using a function of the correct dynamic type, matching the
static type originating from the call [44].
4.2 Execution and Memory Access Isolation
This approach provides limited CFI guarantees with “for-
SSF must enforce Shield boundaries without relying on the ward-edge protection” (i.e., calls), but not “backward-edge
untrusted host OS (R4). Since Spons and Shields can be
192
threads
VEE ’21, April 16, 2021, Virtual, USA Vasily A. Sartakov, Daniel O’Keeffe, David Eyers, Lluís Vilanova, and Peter Pietzuch
2 The CFI pass assigns types to the functions in LLVM’s IR and builds a
1 Thisgroup of functions also includes those that bridge LKL and musl, as jump table, which is used to validate dynamically matching function types
well as some debug functions. in forward-edge indirect control flow transfers.
193
Spons & Shields: Practical Isolation for Trusted Execution VEE ’21, April 16, 2021, Virtual, USA
Shield Shield
Shield Spon
MD5
NGINX OpenSSL
postmaster AESNI
TCP/IP NGINX collector DES3
DES
TCP/IP
Stack NGINX stats encrypt(…)
Stack postmaster AESNI
TEE OS TCP/IP PostgreSQL PHP collector decrypt(…)
encrypt(…)
TCP/IP PostgreSQL PHP
TEE OS Stack shared start(…)
Wireguard
TEETEE
OSOS memory shared stop(…)
start(…)
TCP/IP bgworker vacuum
PostgreSQL
PostgreSQL TCP/IP StackStack memory stop(…)
TEE OS
TEE OS PyCryptodome
TCP/IP https
https checkpoint WAL writer PyCryptodome
TCP/IP checkpoint WAL writer
StackStack https https Python
Python
TEE TEE
OS OS Client
Client ClientClient
TEE OS TEE OS
TEE OS
TEE OS
(a) Multi-enclave deployment (b) Deployment with SSF Figure 8. PostgreSQL with Spons Figure 9. Module isolation
Figure 7. Enclaved, multi-component web service
SSF does not preclude the use of security-sensitive instruc- with 4 cores at 3.90 GHz, 8 MB of LLC and disabled hyper-
tionsShield
such as EMODPE
Spon [72]. Existing techniques are applicable threading and Turbo Boost. The servers have 64 GB of RAM,
here, e.g., an additional code generation pass to avoid in- a 10 Gbps NIC, and run Ubuntu Linux 18.04 with Linux ker-
serting such instructions [78], or compiler labelling with nel 4.15.0-46. The version of the Intel SGX driver is 2.5.
post-compilation binary inspection [72].
6.1 Application Use Cases
5 Discussion We consider two application use cases for fine-grained com-
SSF offers better, yet portable compartmentalisation abstrac- partmentalisation: (i) a web service inspired by our health-
tions for TEEs. It uses TEE technologies (SGX here) to re- care example with a mix of active/passive Spons and nested
move trust from the host OS, enforces the isolation between Shields; and (ii) a Python cryptographic framework that uses
application components as well as the trusted TEE OS, and passive Spons for hardening against vulnerabilities in its C
provides a narrow host interface to minimise reliance on crit- libraries. These represent different types of workloads: the
ical host OS features (fulfilling our intended threat model). former involves a pointer-heavy interpreter, an I/O-intensive
Spons and Shields can be applied at arbitrary granulari- web server, and a database with a complex child/parent life-
ties, from processes to libraries (R1 in §2.3), support shared cycle; the latter heavily invokes isolated modules.
memory by using a single Shield for multiple Spons (R2) Healthcare web service. We evaluate a typical LAMP-based
and are thus able to enhance the security of existing POSIX architecture, with a front-end web server (NGINX v.1.16.1),
applications (R3). SSF also offers insights into the minimum a PHP interpreter (v.7.3.7), and a database backend (Post-
requirements for compartmentalisation inside a TEE (R4). greSQL v.12.2). The NGINX web server plays a crucial role
SGX is particularly challenging due to its lack of extensive in terms of security – it establishes encrypted network con-
hardware support, and we show a promising direction that nections with its clients, and redirects their requests to the
couples simple hardware support (MPX) with software mech- PHP interpreter, which in turn, interacts with the DBMS.
anisms (SSFPass and the TEE OS primitives). NGINX redirects client connections to pre-forked PHP
More importantly, the semantics of Spons and Shields processes using FastCGI over sockets, applying a well-un-
allow SSF to take advantage of both existing and future derstood balance between isolating requests across separate
hardware mechanisms to further improve compartmentalisa- PHP processes, and avoiding the cost of per-request PHP pro-
tion performance. For example, we envision future incarna- cess creation. We use PostgreSQL because it is fully-featured
tions of SSF using hardware such as CHERI capabilities, Intel (vs. SQLite) and yet has a small memory – and thus EPC –
MPK, or Intel’s CET extensions. Finally, note that SSF uses footprint (e.g., compared to MySQL). PostgreSQL spawns
SGXv1 and thus cannot self-manage execution permissions at least six processes that perform different functions; each
on pages; nevertheless, switching to SGXv2 would close that process is derived from the same binary, and all processes
gap by using the new EMODPE and EMODPR instructions. use shared memory to exchange data (see Fig. 8).
Fig. 7 shows two deployments for our web service: the
6 Evaluation first uses vanilla SGX-LKL [58] to deploy the application in
We explore the performance of SSF using two real-world multiple SGX TEEs; the second uses a single SGX TEE with
applications: a multi-process multi-component web service, SSF. Note that SSF can isolate the web server and its SSL
as introduced in Fig. 1 (§6.2) and a Python cryptographic library, which is not possible in the multi-TEE configuration
library (§6.3). We also measure the overhead of the compiler without compromising performance. Existing solutions also
instrumentation (§6.4), and the cost of Spon creation (§6.5). cannot isolate shared libraries, and previous approaches to
We deploy the workloads on SGX-enabled servers with emulating isolated processes within an SGX TEE [72] lack
Intel Xeon E3-1280 v6 CPUs (microcode version 0xca), each the shared memory support required to run PostgreSQL.
194
VEE ’21, April 16, 2021, Virtual, USA Vasily A. Sartakov, Daniel O’Keeffe, David Eyers, Lluís Vilanova, and Peter Pietzuch
Time (secs)
Spons (no Shields) Instrumented
15
2,000 10
101.5
5
0 101 0
2K 8K 32K 128K512K 2M 16 256 4K 64K 1M 16M 20 40 60 80 100
Transfer size (B), log scale Transfer size (B), log scale Number of instances
Figure 10. Request latency (Multi- Figure 11. Request latency (Single- Figure 12. Creation time (Spons vs.
process web service) process NGINX) SGX TEE)
Fig. 7b shows how we map the application to SSF. Spon of markdown files of 2 KB to 2 MB in size, and measure the
and Shield pairs transparently replace processes, and execve request latency of a client requesting random pages.
replaces fork + exec. NGINX isolates its OpenSSL library Fig. 10 compares the request latency for different file sizes
and the private encryption keys into a separate passive Spon with SSF to a multi-TEE deployment. (The data points show
with a nested Shield, and all PostgreSQL Spons use a sin- the mean over 10 runs for each file size; the shaded extent in-
gle Shield to allow shared memory (see Fig. 8). We compile dicates one standard deviation.) The distribution of latencies
all components using SSFPass, link them against spn-musl, is always similar, but SSF provides a consistent improvement.
change process creation to use execve, and configure the Small file sizes (2–32 KB) have an almost constant latency
loader to create appropriate Spons and Shields. We also min- that grows linearly with the file size, because PostgreSQL cre-
imally modify NGINX to isolate its OpenSSL component ates a new Spon on each connection, which requires constant
into a nested Shield using spon_call (the OpenSSL function time (tied to the size of the Spon); for larger files, the over-
wrappers do not require data marshalling and can directly head is small in comparison to the data processing and trans-
operate on the pointers provided by NGINX). fer costs across components. The deployment with Spons and
Python cryptographic library. PyCryptodome [54] is a Shields has, on average, 2.6× lower request latencies than
popular cryptography framework for Python, implemented the multi-TEE one; a deployment without Shields achieves
as a wrapper over C functions. High-level cryptographic 4.4× lower latencies. (CPU load is consistent.)
operations in Python invoke binary modules written in C These results confirm that, in the multi-TEE deployment,
that can e.g., use hardware cryptographic instructions such the need to use encryption between TEEs and the absence of
as AES-NI. Since the low-level cryptographic modules may caches adds more overhead than the SSF’s instrumentation.
have memory safety issues, we use Shields to isolate them. 6.3 Python Cryptographic Library
Fig. 9 shows the design of Python code that isolates each
of the PyCryptodome modules (i.e., ciphers written in C) To measure the overhead when passive Spons are invoked,
using separate passive Spons. A single Python interpreter we compare the performance of PyCryptodome when exe-
uses a set of Spon-Shield pairs. PyCryptodome’s indirection cuted inside a TEE in three configurations: (i) without Spons
layer manages the modules, creating the necessary Spons and as a baseline; (ii) with Spons but without memory instrumen-
Shields and invoking their functions. We create a new Spon- tation; and (iii) with Spons and memory instrumentation.
Shield pair for each encryption context in PyCryptodome, As a workload, we use the pct-speedtest.py benchmark
which contains functions that can be called multiple times. provided by PyCryptodome. It measures the performance
The web service and PyCryptodome applications show- of various cryptographic operations such as encryption and
case the use of asymmetric memory protection in oppo- key initialisation for ciphers. We focus on one benchmark,
site ways. In the web service, NGINX calls into a protected AESNI, with two forms of the AES cipher (GCM and CTR)
OpenSSL encryption library, whereas in PyCryptodome, the with different key lengths (128, 192 and 256 bits).
Python interpreter protects itself from the memory-unsafe Fig. 13 shows the key set-up speed, measured in thousands
cryptographic functions that it invokes. As a consequence, of key initialisations per second. As can be seen from the
PyCryptodome’s indirection layer is responsible for (un)mar- results, while different ciphers have different performance
shalling the arguments and results for the C-module func- characteristics, there is a similar trend across all experiments:
tions into/from their assigned Shield (via alloc_mem; see §3.3). the use of Spons reduces performance by 1%–3%. Memory in-
strumentation adds a further 1%–2% overhead. This is smaller
6.2 Multi-Process Web Service than in the previous experiment with active Spons because
We deploy the multi-process web service with SSF with a this workload dereferences fewer pointers.
PHP-based content management system (CMS) that serves Next we consider the encryption performance. As Fig. 14
markdown pages stored in the DBMS. We generate 500 MB shows, the impact of Spons here is more significant. Using
195
Spons & Shields: Practical Isolation for Trusted Execution VEE ’21, April 16, 2021, Virtual, USA
Key set-ups (1000 keys/sec)
Spons Spons (no instr.) Baseline
Encryption (MB/sec)
Encryption (MB/sec)
30
400 400
20
200 200
10
0 0 0
128-C 128-G 192-C 192-G 256-C 256-G 128-C 128-G 192-C 192-G 256-C 256-G 128-C 128-G 192-C 192-G 256-C 256-G
Cypher configuration Cypher configuration Cypher configuration
Figure 13. Key set-up benchmark Figure 14. AES-NI benchmark Figure 15. AES benchmark
Execution time (secs)
30 Baseline Spons
20
10
0
th cat at ns te on ex g pt de de ze ze ps lse y s s s
ces acces ction ersio
n
ma _con _form nctio ultiby ulati reg hin cry nco e co iali eriali loo p_ife nar
fu ip has n_e son_d ser ter _ac _ n v
b er b er e _ _m a n s o u ns loo loop_ fined fined pe_fu e_con
m m pl ing _m j j e de
_nu _nu sim str ing p_d un ty typ
ing tring ring_ str loo oop_
str s s t l
Figure 16. Performance of various PHP functions for instrumented and non-instrumented php-fcgi compartments (The
instrumentation decreases performance by 22% on average.)
Spons decreases performance by 53%–76% for CTR, and by Table 3. Impact of SSF’s memory instrumentation
38%–44% for GCM. Adding memory instrumentation reduces PHP PostgreSQL NGINX
performance further by 5%–7% for CTR, and by 2% for GCM. Size Time Size Tx per sec. Size
. No instr. 16 MB 243 s 11 MB 528 6.3 MB
In the previous experiment, the time spent on memory Instr. 27 MB 292 s 21 MB 473 11 MB
copies is significant compared to computation time. If com-
putation is expensive, the overhead decreases: the results in 330 MB of data; and (3) we use the NGNIX web server with
Fig. 15 are from the same experiment but without hardware- the OpenSSL library, and have clients fetching static objects
accelerated AES-NI instructions. The impact of using Spons of different sizes.
is also lower: on average, non-instrumented Spons have 57% The PHP benchmark suite allows us to compare the perfor-
lower performance for CTR, and 21% for GCM. mance of PHP functions; pgbench and NGINX benchmarks,
measure overall performance. The NGINX benchmark helps
6.4 Instrumentation Overhead us understand the cost of an asymmetric Spon with OpenSSL.
Next we benchmark Spon instances to determine if there is Performance overhead. For all three benchmarks, we com-
a difference in overhead for different types of computation pare deployments under two configurations: (i) an uninstru-
inside Spons. We also consider the impact on binary size. mented baseline and (ii) Spons with instrumentation.
Since bounds-checking overheads depend on the specific Tab. 3 shows that the instrumented version has an increase
instrumented code, we explore the incurred overheads us- in total execution time of 20%. Fig. 16 gives a detailed break-
ing three workloads: (1) we run a PHP interpreter inside down of the PHP results: the average overhead is 22% across
a Spon and use a benchmark suite [66] that measures the all benchmarks, with performance degrading from 1.55 to
performance of various PHP functions. In contrast to the 1.26 million operations per second.
experiments in §6.2, we directly invoke the PHP interpreter The table also shows the results of the PostgreSQL bench-
to remove the data exchange overhead between the web mark. Compared to PHP, PostgreSQL exhibits only a 10%
server and the PHP worker; (2) we use the pgbench bench- performance decrease: the number of transactions changes
mark from PostgreSQL, providing TPC-B-like queries with from 528 to 473 transactions per second.
five SELECT, UPDATE, and INSERT commands per transac- In contrast to the Python benchmark, NGINX does not
tion [56, 57]. We use a scale factor of 10 to generate the copy data to/from the Spon, and thus the invocation should
initial database, which results in 1,000,000 rows, or almost have less overhead. To measure this, we generate 513 MB
196
VEE ’21, April 16, 2021, Virtual, USA Vasily A. Sartakov, Daniel O’Keeffe, David Eyers, Lluís Vilanova, and Peter Pietzuch
of random files and fetch them via HTTPS using the curl Coarse-grained sandboxing. Past work on software fault
utility in a single thread. In total, we have 30 files (with sizes isolation (SFI) [20, 69, 81] shares SSF’s goal of coarse-grained
from 1 byte to 256 MB, each size a power of 2). We fetch each sandboxing. Like MemSentry [36], SSFPass uses Intel MPX in-
remote file at least 10 times and measure request latency. stead of software-only sandboxing to improve performance,
Fig. 11 shows the results for (i) a baseline without instru- but SSF’s use of MPX differs—MemSentry adds one bound
mentation; (ii) a Spon with instrumentation only; and (iii) a check but relies on Intel MPK and Intel’s VM functions (VM-
Spon with instrumentation and isolated OpenSSL. For sizes FUNC). SSFPass adds two bounds checks but is not restricted
smaller than 100 KB, the response time is the same, and there in the number of domains (MPK) or dependent on technology
is no significant overhead from the instrumentation or the unavailable inside some TEEs (VMFUNC). SEIMI [82] offers
use of Spons. For bigger sizes, the response time grows lin- an intra-process isolation technique based on Supervisor-
early with size, which shows that the overhead is negligible mode Access Prevention (SMAP), but requires code to be
compared to the traffic encryption time. executed in privileged mode. Janus [24, 25] supports switch-
Binary size overhead. SSFPass’s (see §4.2) memory instru- ing of protection domains without involving the kernel, but
mentation adds new instructions each time the target code also relies on either MPK or VMFUNC. As well as sandboxing
accesses pointers. The stripped binary of php-fcgi is 16 MB, mechanisms, SSF provides a flexible abstraction for isolation
whereas the Spon-based PHP interpreter is 1.7× larger at of existing applications, and addresses TEE-specific issues
27 MB (see Tab. 3). The instrumentation increases the size of such as secure and efficient interaction with a TEE OS.
the NGINX binary, which also includes the statically-linked Moat [75] and SIR [74] separate TEE code into untrusted
OpenSSL library, by 1.7× (11 MB vs. 6.3 MB), and the size of and trusted parts, and certify that untrusted machine code
the PostgreSQL binary by 1.9× (21 MB vs. 11 MB). cannot leak confidential information. However, similar to
ConfLLVM [8], they only support two compartments. MP-
6.5 Instantiation Time
TEE [88] uses Intel MPX to provide memory isolation of
Finally, we compare the time to instantiate Spons against program regions and implements trusted memory attributes
SGX TEEs using the SGX-LKL TEE OS. Our simple bench- missing in Intel SGXv1. This technology can be potentially
mark measures the time to create sequentially 100 Spons and applied in SSF to enforce its control-flow integrity.
SGX TEEs, respectively, in batches of 10. We deploy a simple Embedded systems. Since embedded systems often lack
program in each enclave, and limit the enclave size to 8 MB. MMU support, recent work has used memory-protection
Fig. 12 shows that both SSF and regular SGX TEEs scale units (MPUs) for compartmentalisation (e.g., for ARM [15,
linearly in the number of instances, but Spons require sig- 33] and RISC-V [85]). Similar to Intel MPK, MPUs are re-
nificantly less time. On average, Spons are created in 30 ms, stricted in the maximum number of domains (16 on some
while the regular SGX TEE needs 210 ms. ARM hardware). TIMBER-V [85] addresses this limitation by
combining MPUs with tagged memory, but requires custom
7 Related Work
hardware support unavailable in SGX TEEs. None of these
Partitioning frameworks. Wedge [7] creates isolation en- approaches allow for secure and efficient TEE OS interaction.
tities inside processes using a default-deny security model.
Wedge’s Crowbar tool helps developers find which parts of 8 Conclusions
the program to isolate. PrivTrans [10] is a source-level parti-
TEEs isolate applications from the host system software and
tioning tool that splits an application into two separate parts,
even physical attacks but vulnerabilities in TEE code remain
trusted and untrusted. Both PrivTrans and Wedge rely on the
an issue. Fine-grained compartmentalisation increases secu-
kernel for isolation and cannot be used within some TEEs.
rity through defense-in-depth, but current solutions sacri-
Glamdring [40] partitions an application to use SGX TEEs
fice performance and compatibility. We introduce Spons and
according to source-level annotations. SOAAP [22] is an
Shields, new flexible isolation abstractions for TEEs. Spons
LLVM-based tool to help developers reason about what to
are self-contained isolated memory regions that can behave
isolate. Such partitioning policies could potentially be en-
like sandboxed libraries or processes. Spons are isolated in-
forced using SSF’s isolation primitives.
side Shields by the compiler and use hardware acceleration
Panoply [73] is an SGX-based partitioning infrastructure
for bounds-checking. We show how Spons and Shields can
that supports the fork system call by spawning a new TEE
help compartmentalise an existing application inside a TEE
and copying data from the parent enclave: stronger isolation
and to port a multi-process application to a TEE.
but with higher overhead. GOTEE [19] enables automatic
partitioning of applications written in Go into TEE and un-
Acknowledgements
trusted code, but does not isolate application components
within a TEE. Komodo [17] proposes a flexible software- This work was partially funded by the UK Government’s
defined TEE model. SSF provides interfaces for compartmen- Industrial Strategy Challenge Fund (ISCF) under the Digital
talising applications inside TEEs, that could use Komodo. Security by Design (DSbD) Programme.
197
Spons & Shields: Practical Isolation for Trusted Execution VEE ’21, April 16, 2021, Virtual, USA
198
VEE ’21, April 16, 2021, Virtual, USA Vasily A. Sartakov, Daniel O’Keeffe, David Eyers, Lluís Vilanova, and Peter Pietzuch
Commodity Hardware. In Proceedings of the Twelfth European Confer- of the Intel MPX System Stack. Proceedings of the ACM on Measurement
ence on Computer Systems (EuroSys ’17). ACM, 437–452. and Analysis of Computing Systems 2, 2 (2018), 28.
[37] S. Kuenzer, S. Santhanam, Y. Volchkov, F. Schmidt, F. Huici, Joel Nider, [53] Meni Orenbach, Pavel Lifshits, Marina Minkin, and Mark Silberstein.
Mike Rapoport, and Costin Lupu. 2019. Unleashing the Power of 2017. Eleos: ExitLess OS Services for SGX Enclaves. In Proceedings of
Unikernels with Unikraft. In Proceedings of the 12th ACM International the Twelfth European Conference on Computer Systems (EuroSys). ACM,
Conference on Systems and Storage (SYSTOR ’19). ACM, 195. 238–253.
[38] Dmitrii Kuvaiskii, Oleksii Oleksenko, Sergei Arnautov, Bohdan [54] Panoply 2021. A self-contained cryptographic library for Python.
Trach, Pramod Bhatotia, Pascal Felber, and Christof Fetzer. 2017. https://github.com/Legrandin/pycryptodome. Last accessed: March 8,
SGXBOUNDS: Memory Safety for Shielded Execution. In Proceed- 2021.
ings of the Twelfth European Conference on Computer Systems (EuroSys [55] Soyeon Park, Sangho Lee, Wen Xu, Hyungon Moon, and Taesoo Kim.
’17). ACM, 205–221. 2019. libmpk: Software Abstraction for Intel Memory Protection Keys
[39] Chris Lattner and Vikram Adve. 2004. LLVM: A compilation frame- (Intel MPK). In 2019 USENIX Annual Technical Conference (USENIX
work for lifelong program analysis & transformation. In Proceedings ATC 19). 241–254.
of the international symposium on Code generation and optimization: [56] PostgreSQL. 2021. A Simple Benchmark Program for Post-
feedback-directed and runtime optimization. IEEE Computer Society, greSQL. https://github.com/postgres/postgres/tree/master/src/bin/
75–86. pgbench. Last accessed: March 8, 2021.
[40] Joshua Lind, Christian Priebe, Divya Muthukumaran, Dan O’Keeffe, [57] PostgreSQL. 2021. PostgreSQL 12.3 Documentation. https://www.
Pierre-Louis Aublin, Florian Kelbert, Tobias Reiher, David Goltzsche, postgresql.org/docs/12/index.html. Last accessed: March 8, 2021.
David Eyers, Rüdiger Kapitza, et al. 2017. Glamdring: Automatic Ap- [58] Christian Priebe, Divya Muthukumaran, Joshua Lind, Huanzhou Zhu,
plication Partitioning for Intel SGX. In 2017 USENIX Annual Technical Shujie Cui, Vasily A. Sartakov, and Peter Pietzuch. 2019. SGX-LKL:
Conference (USENIX ATC 17). 285–298. Securing the Host OS Interface for Trusted Execution. arXiv preprint
[41] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner arXiv:1908.11143 (2019).
Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel [59] Niels Provos, Markus Friedl, and Peter Honeyman. 2003. Preventing
Genkin, Yuval Yarom, and Mike Hamburg. 2018. Meltdown: Reading Privilege Escalation. In 12th USENIX Security Symposium (USENIX
Kernel Memory from User Space. In 27th USENIX Security Symposium Security 03). USENIX Association.
(USENIX Security 18). USENIX Association, 973–990. [60] Octavian Purdila, Lucian Adrian Grijincu, and Nicolae Tapus. 2010.
[42] James Litton, Anjo Vahldiek-Oberwagner, Eslam Elnikety, Deepak LKL: The Linux Kernel Library. In 9th RoEduNet IEEE International
Garg, Bobby Bhattacharjee, and Peter Druschel. 2016. Light-Weight Conference. IEEE, 328–333.
Contexts: An OS Abstraction for Safety and Performance. In 12th [61] Ramu Ramakesavan, Dan Zimmerman, and Pavithra Singaravelu. 2015.
USENIX Symposium on Operating Systems Design and Implementation Intel Memory Protection Extensions (Intel MPX) Enabling Guide.
(OSDI 16). USENIX Association, 49–64. [62] Charlie Reis. 2018. Mitigating Spectre with Site Isolation in Chrome.
[43] Yutao Liu, Tianyu Zhou, Kexin Chen, Haibo Chen, and Yubin Xia. 2015. [63] Charles Reis, Alexander Moshchuk, and Nasko Oskov. 2019. Site Iso-
Thwarting Memory Disclosure with Efficient Hypervisor-Enforced lation: Process Separation for Web Sites within the Browser. In 28th
Intra-Domain Isolation. In Proceedings of the 22nd ACM SIGSAC Con- USENIX Security Symposium (USENIX Security 19). USENIX Associa-
ference on Computer and Communications Security (CCS ’15). ACM, tion, 1661–1678.
1607–1619. [64] Lars Richter, Johannes Götzfried, and Tilo Müller. 2016. Isolating
[44] LLVM. Last accessed: March 8, 2021. Control Flow Integrity. https: Operating System Components with Intel SGX. In Proceedings of the
//clang.llvm.org/docs/ControlFlowIntegrity.html. 1st Workshop on System Software for Trusted Execution. 1–6.
[45] Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David [65] Ryan Roemer, Erik Buchanan, Hovav Shacham, and Stefan Savage.
Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand, 2012. Return-Oriented Programming: Systems, Languages, and Appli-
and Jon Crowcroft. 2013. Unikernels: Library Operating Systems for cations. ACM Transactions on Information and System Security 15, 1,
the Cloud. SIGARCH Comput. Archit. News 41, 1 (March 2013), 461–472. Article 2 (March 2012), 34 pages.
[46] Frank McKeen, Ilya Alexandrovich, Ittai Anati, Dror Caspi, Simon [66] RuSoft. 2021. PHP benchmark script. https://github.com/rusoft/php-
Johnson, Rebekah Leslie-Hurd, and Carlos Rozas. 2016. Intel® Soft- simple-benchmark-script. Last accessed: March 8, 2021.
ware Guard Extensions (Intel® SGX) Support for Dynamic Memory [67] Vasily Sartakov, Lluís Vilanova, and Peter Pietzuch. 2021. CubicleOS:
Management Inside an Enclave. In Proceedings of the Hardware and A Library OS with Software Componentisation for Practical Isolation.
Architectural Support for Security and Privacy 2016. 1–9. In Proceedings of the Twenty-Sixth International Conference on Archi-
[47] Frank McKeen, Ilya Alexandrovich, Alex Berenzon, Carlos V Rozas, tectural Support for Programming Languages and Operating Systems
Hisham Shafi, Vedvyas Shanbhogue, and Uday R Savagaonkar. 2013. (ASPLOS ’21). ACM, 575–587.
Innovative Instructions and Software Model for Isolated Execution. [68] Vasily A. Sartakov, Stefan Brenner, Sonia Ben Mokhtar, Sara
HASP@ ISCA 10 (2013). Bouchenak, Gaël Thomas, and Rüdiger Kapitza. 2018. EActors: Fast
[48] Microsoft Azure. 2021. Azure confidential computing. https://azure. and Flexible Trusted Computing Using SGX. In Proceedings of the 19th
microsoft.com/en-us/solutions/confidential-compute. Last accessed: International Middleware Conference (Rennes, France) (Middleware ’18).
March 8, 2021. ACM, New York, NY, USA, 187–200.
[49] Paul Muntean, Mathias Neumayer, Zhiqiang Lin, Gang Tan, Jens [69] David Sehr, Robert Muth, Cliff Biffle, Victor Khimenko, Egor Pasko,
Grossklags, and Claudia Eckert. 2020. ρFEM: Efficient Backward- Karl Schimpf, Bennet Yee, and Brad Chen. 2010. Adapting Software
Edge Protection Using Reversed Forward-Edge Mappings. In Annual Fault Isolation to Contemporary CPU Architectures. In 19th USENIX
Computer Security Applications Conference (Austin, USA) (ACSAC ’20). Security Symposium (USENIX Security 10). USENIX Association.
ACM, 466âĂŞ479. [70] Hovav Shacham. 2007. The Geometry of Innocent Flesh on the Bone:
[50] musl libc. 2021. https://www.musl-libc.org. Last accessed: March 8, Return-into-Libc without Function Calls (on the X86). In Proceedings
2021. of the 14th ACM Conference on Computer and Communications Security
[51] nginx, an HTTP server. 2021. https://www.nginx.org. Last accessed: (CCS ’07). ACM, 552–561.
March 8, 2021. [71] Vedvyas Shanbhogue, Deepak Gupta, and Ravi Sahita. 2019. Secu-
[52] Oleksii Oleksenko, Dmitrii Kuvaiskii, Pramod Bhatotia, Pascal Felber, rity Analysis of Processor Instruction Set Architecture for Enforcing
and Christof Fetzer. 2018. Intel MPX Explained: A Cross-Layer Analysis
199
Spons & Shields: Practical Isolation for Trusted Execution VEE ’21, April 16, 2021, Virtual, USA
Control-Flow Integrity. In Proceedings of the 8th International Work- [80] Lluís Vilanova, Muli Ben-Yehuda, Nacho Navarro, Yoav Etsion, and
shop on Hardware and Architectural Support for Security and Privacy Mateo Valero. 2014. CODOMs: Protecting Software with Code-centric
(HASP ’19). ACM, Article 8, 11 pages. Memory Domains. In Intl. Symp. on Computer Architecture (ISCA).
[72] Youren Shen, Hongliang Tian, Yu Chen, Kang Chen, Runji Wang, Yi 469–480.
Xu, Yubin Xia, and Shoumeng Yan. 2020. Occlum: Secure and Efficient [81] Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Gra-
Multitasking Inside a Single Enclave of Intel SGX. In ASPLOS ’20. ham. 1993. Efficient Software-Based Fault Isolation. SIGOPS Oper. Syst.
[73] Shweta Shinde, DL Tien, Shruti Tople, and Prateek Saxena. 2017. Rev. 27, 5 (Dec. 1993), 203–216.
Panoply: Low-TCB Linux Applications With SGX Enclaves. In Proceed- [82] Z. Wang, C. Wu, M. Xie, Y. Zhang, K. Lu, X. Zhang, Y. Lai, Y. Kang,
ings of the Annual Network and Distributed System Security Symposium and M. Yang. 2020. SEIMI: Efficient and Secure SMAP-Enabled Intra-
(NDSS). 12. process Memory Isolation. In 2020 IEEE Symposium on Security and
[74] Rohit Sinha, Manuel Costa, Akash Lal, Nuno P. Lopes, Sriram Rajamani, Privacy (SP). 592–607.
Sanjit A. Seshia, and Kapil Vaswani. 2016. A Design and Verification [83] Robert NM Watson, Jonathan Woodruff, Peter G Neumann, Simon W
Methodology for Secure Isolated Regions. In PLDI ’16. ACM, 17 pages. Moore, Jonathan Anderson, David Chisnall, Nirav Dave, Brooks Davis,
[75] Rohit Sinha, Sriram Rajamani, Sanjit Seshia, and Kapil Vaswani. 2015. Khilan Gudka, Ben Laurie, et al. 2015. CHERI: A Hybrid Capability-
Moat: Verifying Confidentiality of Enclave Programs. In Proceedings of System Architecture for Scalable Software Compartmentalization. In
the 22nd ACM SIGSAC Conference on Computer and Communications 2015 IEEE Symposium on Security and Privacy. IEEE, 20–37.
Security (CCS ’15). ACM, 1169–1184. [84] Nico Weichbrodt, Anil Kurmus, Peter Pietzuch, and Rüdiger Kapitza.
[76] The Apache Software Foundation. The Apache HTTP server. 2021. 2016. AsyncShock: Exploiting synchronisation bugs in Intel SGX
https://www.apache.org. Last accessed: March 8, 2021. enclaves. In European Symposium on Research in Computer Security.
[77] Caroline Tice, Tom Roeder, Peter Collingbourne, Stephen Checkoway, Springer, 440–457.
Úlfar Erlingsson, Luis Lozano, and Geoff Pike. 2014. Enforcing Forward- [85] Samuel Weiser, Mario Werner, Ferdinand Brasser, Maja Malenko, Ste-
edge Control-flow Integrity in GCC & LLVM. In USENIX Security fan Mangard, and Ahmad-Reza Sadeghi. 2019. TIMBER-V: Tag-Isolated
Symposium. USENIX Sec. Memory Bringing Fine-grained Enclaves to RISC-V. In NDSS.
[78] Anjo Vahldiek-Oberwagner, Eslam Elnikety, Nuno O. Duarte, Michael [86] Yuanzhong Xu, Weidong Cui, and Marcus Peinado. 2015. Controlled-
Sammler, Peter Druschel, and Deepak Garg. 2019. ERIM: Secure, Effi- Channel Attacks: Deterministic Side Channels for Untrusted Operating
cient In-process Isolation with Protection Keys (MPK). In 28th USENIX Systems. In Proceedings of the 2015 IEEE Symposium on Security and
Security Symposium (USENIX Security 19). USENIX Association, 1221– Privacy (SP ’15). IEEE, 640–656.
1238. [87] Bin Zeng, Gang Tan, and Greg Morrisett. 2011. Combining Control-
[79] Jo Van Bulck, Nico Weichbrodt, Rüdiger Kapitza, Frank Piessens, and Flow Integrity and Static Analysis for Efficient and Validated Data
Raoul Strackx. 2017. Telling Your Secrets without Page Faults: Stealthy Sandboxing. In Proceedings of the 18th ACM Conference on Computer
Page Table-Based Attacks on Enclaved Execution. In 26th USENIX and Communications Security (CCS ’11). ACM, 29–40.
Security Symposium (USENIX Security 17). USENIX Association, 1041– [88] Wenjia Zhao, Kangjie Lu, Yong Qi, and Saiyu Qi. 2020. MPTEE: Bring-
1056. ing Flexible and Efficient Memory Protection to Intel SGX. In Proceed-
ings of the Fifteenth European Conference on Computer Systems (EuroSys
’20). ACM, Article 18, 15 pages.
200