Sok: Enabling Security Analyses of Embedded Systems Via Rehosting
Sok: Enabling Security Analyses of Embedded Systems Via Rehosting
Sok: Enabling Security Analyses of Embedded Systems Via Rehosting
ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
687
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
of applications, embedded systems are typically less secure than Instruction Set
GPIO_B
socio-economic factors. Embedded systems often contain numerous Function Code UART1
Serial
Port
special-purpose peripherals leading to a larger attack surface than Application Code
UART2
WiFi
module
their general-purpose counterparts. BroadPwn [3], an unauthenti- Kernel and Driver Code
UART3
s
ered or replicated through analysis of a single application, since the
exploit chain may leverage behavior of—and data flows between— Flash EEPROM
688
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
to run sufficiently for some analysis task. We define this process as 1 int secure_memcmp ( char *s1 , char * s2 , int len ) {
follows: 2 int res = 0;
3
Definition 4. Rehosting: The process of building an RES for a given 4 // Determine if any characters mismatch
embedded system to enable a specified analysis task. May include 5 for ( int i = 0; i < len ; i ++)
6 res |= s1 [ i ] ^ s2 [ i ]
modifications to the firmware.
7 return res ;
Consider the hypothetical embedded system presented in Fig. 1. 8 }
9
By limiting the scope of the VE to only the necessary components,
10 int auth_user ( void ) {
rehosting greatly reduces the barrier to entry for dynamic security 11 nvram_handle_t h ;
analyses. Once firmware can be run in a VE, it is inspectable, mu- 12 char * pwd ;
table, replicable, scalable, and disposable, properties critical for dy- 13 char buf [32];
namic security analyses that are generally absent from physical sys- 14
15 // Read stored password from nvram
tems. 16 h = nvram_open (" / dev / nvram " , O_RDONLY ) ;
Inspectability, scalability, and disposability are critical for large- 17 pwd = nvram_get (h , " user_pass ") ;
scale, coverage-guided fuzz testing where mutated inputs are passed 18
into many copies of a system. By inspecting program execution, a 19 // Read data from ( untrusted ) user
20 gets ( buf ) ;
fuzzer can measure code coverage to guide mutations of inputs, ex-
21 return secure_memcmp ( buf , pwd , strlen ( pwd ) ) ;
pediting discovery of new execution paths and potential bugs [45]. 22 }
Inspectability also underpins dynamic taint analysis, a technique for
Listing 1: Insecure authentication function (hypothetical).
tracking how program states and values are derived from specially
marked inputs. Taint analysis can identify execution paths in which
tainted inputs influence sensitive parts of a system. Taint informa-
tion can aid both vulnerability discovery and general reverse engi- nvram_open nor nvram_get is checked. These functions may return
neering [71]. NULL if the peripheral acting as NVRAM fails. While this would lead
Mutability, replicability, inspectability, and disposability enable to a crash on general-purpose computers, embedded systems com-
forced execution, a technique which explores a program’s state by monly fail to deploy memory protections and allow mapping of the
repeatedly executing uncovered branches via mutation of CPU state NULL page. In such a scenario, an attacker who can predict the con-
at branching instructions. Forced execution can be used to generate tents of the NULL page would be able to bypass the authentication
both control flow graphs and call graphs that are more accurate than check.
those produced by static analysis. This technique can also be used Even seemingly bug-free portions of code can turn into vulnera-
to aid in dynamic type reconstruction [63]. bilities depending on the hardware environment in which they are
More broadly, accurately rehosting a system into a VE where it executed. The secure_memcmp function in Listing 1 aims to prevent
is fully inspectable enables security analyses to consider any possi- timing side-channel attacks for password-retrieval by introducing a
ble input to a system and the resulting behavior. If the inputs to an constant time string comparison. However, this mitigation is predi-
embedded system can be configured in such a way to produce unde- cated on the assumption that the system executes OR and XOR oper-
sired behavior, an attacker may leverage this vulnerability and craft ations without data-dependent differences in speed. If this assump-
the necessary inputs to exploit the system. tion is incorrect, an exploitable timing side-channel may exist.
Although the process of rehosting a system fundamentally re- If a security analysis is conducted using a VE which fails to suffi-
quires decoupling its software stack from its physical hardware, this ciently capture behavior of these layers, the analysis results may be
decoupling can also occur at higher layers of abstraction. Logic within inaccurate. As such, it is essential to understand any modifications
user space applications (function layer), or even which applications to abstraction layers made by an RES.
are run (application layer), may be modified to enable rehosting. If
an OS is present, it may be modified to enable rehosting (OS layer).
Finally, the peripherals and CPUs of a physical system may be mod- 3 CHALLENGES TO BUILDING VIRTUAL
ified to support rehosting (hardware layer). Vulnerabilities in em- ENVIRONMENTS
bedded systems may be caused by one or more mistakes in a single Hardware and software of embedded systems are tightly coupled
abstraction layer or the interactions between mistakes across multi- and tailored to perform a set of specific operations. Examples of
ple layers. Therefore modifications must be made with caution. such systems are provided in Appendix A. Embedded systems are
incredibly diverse as each is designed to satisfy a novel combina-
2.1 Multi-Layer Vulnerabilities tion of use case, power, performance, and cost constraints. These
To illustrate how vulnerabilities arise from interactions across sys- constraints impact all phases of the system design including instruc-
tem layers, consider the auth_user function in Listing 1 which com- tion set architecture selection, peripheral selection, hardware design,
pares user input against a password stored in non-volatile storage and software functionality. To satisfy these constraints, modular-
(NVRAM). While use of the gets function introduces a clear vul- ity and standardization—typically emphasized on general purpose
nerability contained in the function layer, other bugs in the snippet computers—are routinely sacrificed for custom logic that assumes
can turn into vulnerabilities depending on the hardware environ- specific hardware configurations. These assumptions introduce chal-
ment that executes this code. Note that neither the return value of lenges for security analyses that seek to evaluate firmware in a VE.
689
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
As a result of this diversity, multiple taxonomies for classifying 3.3 Modeling Peripherals
embedded systems exist. We follow the security-oriented classifica- Peripheral devices work alongside the CPU to provide additional
tion proposed by Muench et al. [60] which categorizes embedded functionality and interface with the external world. Traditional desk-
systems based on their deployed OS type. Type-1 systems use gen- top systems use peripheral enumeration to dynamically discover de-
eral purpose OSs retrofitted for embedded systems; Type-2 systems vices as they are connected and disconnected over external (e.g.,
use custom embedded OSs; and Type-3 systems do not use OS ab- USB) or internal (e.g., PCIe) buses [30, 32]. Moreover, BIOS and
stractions at all. Non-embedded systems are described as Type-0. UEFI both provide standardized OS/HW interface abstractions for
desktop systems: the OS can query for peripherals not present on
enumerable buses [31] and assume a standard I/O address space [5].
3.1 Obtaining Firmware In contrast, embedded systems often lack an equivalent OS/HW
Conducting dynamic analyses of any system fundamentally requires interface abstraction and instead rely on a fixed set of permanently
access to the code the system executes (compiled or as source). But connected peripherals. Peripheral configurations are often tightly
unlike with traditional software systems, possession of an embed- coupled with the OS and applications due to the manufacturer’s
ded system does not immediately enable analysis of its logic. Ob- knowledge of the hardware configuration. As such, VEs generally
taining its firmware may require significant resources, especially as must model these peripherals to ensure systems behave as expected.
firmware may be encrypted-at-rest or guarded by hardware readout Beyond simply ensuring functionality, modeling peripherals in VEs
protections. In these cases, invasive hardware attacks [61, 64, 79] is critical for security analyses as peripherals are typically the source
can typically be used to extract firmware from an embedded system of attacker-controlled data.
but require specialized equipment, such as scanning electron micro-
scopes and focused ion beams [78]. On the other hand, non-invasive 3.4 Evaluating Fidelity
hardware attacks (e.g., connecting to debug interfaces) and software- The fidelity of a rehosted system describes how well the behavior of
based techniques (e.g., downloading/intercepting firmware updates an RES mirrors it’s physical counterpart. The literature lacks a for-
or software exploitation) for extracting firmware vary from one em- mal definition of rehosting fidelity and no large-scale fidelity evalu-
bedded system to the next. When available, these approaches pro- ations have been conducted to date. However, individual rehosting
vide an alternative path to firmware extraction without the need for techniques have been evaluated by measuring if systems accept net-
specialized equipment [80]. work connections [10], collecting and comparing peripheral interac-
tions [49], and comparing the similarity of instruction traces [8].
We identify that, in the general case, the problem of measuring
3.2 Understanding Instruction Set Architectures rehosting fidelity is an example of an unsolvable variation on the
Though some embedded systems use programmable logic chips such equivalence problem [87]. Since current rehosting techniques com-
as field programmable gate arrays (FPGAs) or complex programmable monly produce RESs with clearly distinct behavior from their phys-
logic devices, most rely on a primary, general-purpose CPU. An in- ical counterparts, fidelity can typically be described by measuring
struction set architecture (ISA) describes how a CPU decodes and the observable differences.
executes machine instructions.
The x86 ISA family is used for the vast majority of general-purpose 4 QUANTIFYING THE DIFFICULTY OF
computers and, as such, many systems have been developed to ana- EMBEDDED HARDWARE EMULATION
lyze and emulate it [48, 70]. On the other hand, the embedded mar-
ket routinely uses ARM, MIPS, PPC, AVR, and other ISA families. To motivate the need for firmware-centric rehosting, we analyze two
Within each family there are often incompatibilities between various corpora of machine-parsable hardware descriptions, jointly repre-
offerings, and some ISAs, such as MIPS and Xtensa, allow vendor senting over two and a half thousand embedded SoCs, and evaluate
customization which creates even more diversity. the tractability of using open source HESs to replicate the described
The diversity of ISAs used in embedded systems poses challenges hardware.
to security research as analyzing machine code for any given system With these corpora, we measure the availability of VXEs for the
requires a detailed understanding of the system’s ISA. A VE must described CPUs and HES support for peripherals found in the de-
capture this understanding through a Virtual Execution Engine: scribed systems. Subsequently, we estimate the complexity of the
described peripherals and evaluate the feasibility of generating pe-
ripheral models as needed through a Monte Carlo simulation. Our
Definition 5. Virtual Execution Engine (VXE): A mechanism for results show that developing HESs does not scale and that a large
interpreting instructions for a given ISA in a VE. gap exists between the hardware platforms supported by such sys-
tems and the platforms used by firmware.
A VXE may provide this interpretation using a model of an ISA 4.1 Datasets
specification or by running an interpreter on an intermediate repre-
Our datasets consist of 1,956 individual Device Tree Blob (DTB)
sentation of version of compiled code (e.g., McSema [25]). A HES
files from Linux Kernel version 5.11.41 and 618 manufacturer-provided
should provide a VXE that fully captures an ISA’s semantics, but an
RES need only support the subset of the ISA that is actually used by
a given firmware. 1 https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.11.4.tar.xz (February 2021)
690
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
System View Description (SVD) files2 conforming to the Cortex Mi- Table 1: Observed CPU models supported by QEMU versions.
crocontroller Software Interface Standard. DTB files are standard-
ized descriptions of hardware (CPU and on/off-chip peripherals) for v2.11.1 (Feb. '18) v5.2.0 (Dec. '20)
Type-1 embedded systems, agnostic of OS or architecture, and are Models Dataset Models Dataset
Arch avail. supported avail. supported
parsed by an OS kernel during boot to drive hardware initialization.
ARM 31 20% 36 20%
Systems in our Linux dataset range from development boards (e.g., ARM64 33 9% 39 12%
ARM Versatile Platform Baseboard) to commercial products (e.g., MIPS 15 50% 16 50%
Nintendo Gamecube). Linux DTB support dates back to 2005 [51] PPC 407 53% 407 53%
with usage in PPC kernel ≥ 2.6 and ARM kernel ≥ 3.7.
The SVD files describe ARM Cortex hardware for Type-2 and
Type-3 embedded systems, such as medical devices and mesh net- Table 2: Peripheral diversity.
work transmitters respectively. Though OS-agnostic, they describe
exclusively ARM architecture systems. SVD files are used during (a) Type-1 Linux Systems (DTB corpus)
development and debugging to understand the memory-mapped in- Arch |SoC| Unique P µ ± σ P/SoC x̃ P/SoC
terface between a CPU and an SoC’s peripherals [2]. ARM 1,310 6,858 58 ± 26 55
Prior firmware measurement studies [10, 19] used web crawlers ARM64 430 3,653 58 ± 24 59
to collect publicly available firmware images. These approaches are MIPS 20 270 21 ± 11 16
difficult to reproduce as copyright restrictions prevent corpora dis- PPC 196 1,422 31 ± 19 27
tributed and link rot prevents crawlers from running successfully
after release. In contrast, kernel source and SVD datasets are largely (b) Type-2 and Type-3 ARM Cortex Systems (SVD corpus)
available and mirrored, making our approach fully reproducible. Ap- Vendor |SoC| Unique P µ ± σ P/SoC x̃ P/SoC
pendix B describes our methodology in detail. Our analysis code is Atmel 147 416 34 ± 10 30
publicly available and containerized3 . Freescale 133 561 49 ± 13 47
Fujitsu 100 237 44 ± 9 41
4.2 Approach NXP 24 374 28 ± 18 21
We use two corpora to analyze 2,574 real-world hardware configura- STMicro 72 852 59 ± 22 58
SiliconLabs 10 62 40 ± 2 40
tions. We only consider architectures (DTB corpus) or silicon ven-
Spansion 88 193 44 ± 9 42
dors (SVD corpus) with 10 or more distinct samples. Each sample
TI 52 95 27 ± 4 26
maps to exactly one real-world embedded system. We quantify:
(1) VXE availability by contrasting corpora CPU models against
those supported in a mature, open-source HES; QEMU, only 12% of the observed ARM64 CPU models are sup-
(2) VE diversity by measuring the number of unique peripherals ported. Appendix B.4 provides data for the similarly miniscule in-
in corpora SoCs; crease of peripheral totals across these versions.
(3) VE complexity by measuring complexity of corpora SoCs as
a function of peripheral driver code size; and 4.4 Diversity: Unique Peripherals
(4) Tractability of HES implementation by simulating VE cre-
ation to measure transferable work. Beyond the VXE, a VE for an SoC must also handle peripheral in-
teractions. Across our corpora, we see 14,715 distinct peripherals, a
All of these analyses use data from our DTB corpus while diver-
quantity that far exceeds the number of supported peripherals in any
sity (2) and tractability (4) are supplemented with information from
modern HES. Methodology for peripheral identification is detailed
our SVD corpus.
in Appendix B.2.
Table 2 shows the SoC count and unique peripheral count for each
4.3 Availability: Virtual Execution Engines architecture (DTB corpus) and each ARM Cortex vendor (SVD cor-
A critical component of a VE is its VXE. To evaluate the availability pus). We also calculate the mean with standard deviation of periph-
of VXEs for real-world embedded systems, we contrast CPU models eral count per SoC and the median peripheral count per SoC (repre-
present in our DTB corpora with those supported in the QEMU em- sented as x̃). From these data, we see that a Type-1 PPC HES would
ulator [4], the predominant, open-source, HES. Matches are based need to support 1,422 unique peripherals to support all 196 Type-1
on exact CPU core, e.g., cortex-a9, not ISA version, e.g., ARMv7-A. PPC SoCs or 31 peripherals, on average, for any single SoC. The
While using an alternate model of the same ISA version may some- diversity of peripherals present in Type-2 and Type-3 ARM Cortex
times be possible, in practice such a substitution may introduce dis- systems varies by silicon manufacturer, but the mean and median
crepancies (e.g., illegal extension instruction). Appendix B.1 pro- peripheral per SoC is comparable to that of Type-1 ARM systems.
vides additional details of our CPU model matching methodology. In contrast to the diversity of peripherals in real-world embed-
The results of this comparison are presented in Table 1. Note how ded systems, modern open-source HESs support relatively small
little CPU support has increased over time, both in absolute counts sets of peripherals. For example, QEMU version 5.2.0 has models
and as a percentage of corpus CPUs. In the worst case for modern for 337 ARM64 peripherals and 216 PPC peripherals while OVP-
2 https://github.com/posborne/cmsis-svd sim [53], an HES leveraging standardized peripheral models, has
3 https://github.com/igloo-re/rehosting_sok only 23 ARM peripherals and 3 MIPS peripherals.
691
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
Table 3: SLOC for open-source device drivers. SoCs = [...] // Systems from DTB or SVD corpus
for sim_r ound = 1 to 1000 do
Pm = { ∅ }
Arch |DD| µ ± σ SLOC/DD |SoC| µ ± σ SLOC/SoC // Randomly sample 10% of SoCs and simulate updating HES
ARM 3,783 617.29 ± 650.50 1,310 43,036.59 ± 21,598.35 to support each
ARM64 2,414 665.87 ± 716.87 430 44,088.23 ± 20,404.85 for i = 1 to |SoCs |/10 do
MIPS 175 383.46 ± 465.71 20 9,406.93 ± 5,318.66 syst em = дet _r and _f r om(SoCs)
PPC 324 495.08 ± 428.79 196 22,879.97 ± 12,861.21 Pu = { ∅ }
foreach p ∈ syst em do
if p < Pm then
Pu = (Pu ∪ {p })
4.5 Complexity: Driver SLOC as a Proxy end
We can approximate peripheral and SoC complexity for embedded end
Linux systems by measuring Source Lines Of Code (SLOC) for Pm = (Pm ∪ Pu )
open-source drivers corresponding to peripherals referenced in our Recor d ( |Pu |) // Per-system effort
DTB corpus. Driver SLOC is a proxy for complexity: a device driver end
implements only half of the OS/hardware “conversation.” Each dri- Recor d ( |Pm |) // Cumulative effort
ver interfaces with a hardware peripheral whose internal states and end
logic may not always be proportionally complex to that of the de- Algorithm 1: Simulation of peripheral modeling.
vice driver. Thus, we use SLOC data to provide a rough estimate of
the software engineering effort required to build a model of a given
peripheral. SLOC computation methodology and an analysis of the
QEMU-supported CPUs or peripherals from the modeled total. This
correlation between QEMU peripheral vs. device driver SLOC is
ensures our results reflect the difficulty of HES construction in gen-
described in Appendix B.3.
eral, not just the current state of QEMU. We contrast against QEMU
Table 3 shows the mean and standard deviation of SLOC for de-
only to provide context. For each of the randomly selected SoCs, we
vice drivers and SoCs. Notably, the standard deviation for SLOC per
record how many new peripherals must be modeled to estimate the
driver can be higher than the mean, indicating considerable variabil-
marginal work required to update the HES to support the new SoC
ity in driver complexity. Looking at ARM64 as an example, we can
given all the prior (simulated) work. Algorithm 1 depicts this simu-
expect the average SoC to require over 44,000 device driver SLOC
lation in detail.
implemented as kernel code to manage its hardware. Although not
In each round of the simulation, we sample 10% of the relevant
every device driver will mirror the complexity of its associated hard-
corpus (at random and without replacement) and simulate building
ware peripheral and our sample of open-source drivers may not be
an HES to support the selected systems. The peripherals supported
representative of all closed-source counterparts, these SLOC mea-
by the HES are tracked in Pm , which begins as an empty set. As
surements hint at the scale of software engineering effort required
we iterate through the selected systems in a random order, we pop-
to build HESs for disparate SoCs.
ulate Pu with a list of the peripherals used by each system that are
unsupported by the HES. After examining each system, we simulate
4.6 Tractability: Simulation of Emulating updating the HES to support these peripherals by adding Pu into Pm .
Hardware Systems Thus, |Pm | is a running total of aggregate effort, and |Pu | is the per-
One potential strategy to build an HES that supports a large number system required effort. This Monte Carlo simulation runs for 1,000
of SoCs is to incrementally add support for new hardware compo- rounds.
nents the first time each component is present in an SoC of interest. To measure how much of our theoretical implementation work
If such a strategy were to be pursued from scratch, building the HES translates between SoCs, we consider the mean count of unimple-
for the first SoC would require modeling the VXE and all its periph- mented peripherals, |P¯u |, in each of the sampled systems. This value
erals. Extending the HES to support subsequent SoCs would require is shown in Fig. 2, across all rounds for each of our simulations.
less effort if the VXE and peripherals were used by a prior SoC and Note that for the final, 195th, Linux system, we still have to imple-
thus already supported by the HES. This approach would be viable ment 11.4±14.4 peripherals, on average. These simulations suggest
for generating VEs if, after some initial effort building models for that even if an analyst chose to manually implement thousands of
common VXEs and peripherals, the problem were to become signif- peripherals into an HES, missing peripherals would still be a major
icantly easier. By running two Monte Carlo simulations of building roadblock to supporting subsequent SoCs. Note that the simulation’s
an HES to support SoCs from our datasets, we discover that this random selection naturally accounts for “most common” peripher-
is not the case and conclude that building or extending HESs is an als. The simulation results show that peripherals are so diverse that
impractical approach to building VEs for SoCs of interest. prioritization is not helpful.
In our simulations, we imagine an analyst selecting SoCs at ran- On average, developing an HES capable of supporting a randomly
dom and building an HES by creating new peripheral models and selected 10% of our SoCs would require supporting 3, 947 ± 159
VXEs whenever an SoC has a never-before-seen peripheral or CPU. peripherals for Linux systems or 1, 730 ± 125 peripherals for ARM
To simplify result summary, we treat CPUs and peripherals equiva- Cortex systems. By contrast, QEMU 5.2 and OVPsim implement
lently throughout the simulation. This does not impact result valid- a total of 1, 083 and 216 peripherals respectively for the surveyed
ity since both must be modeled in an HES. We also do not subtract architectures.
692
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
80 80
Mean Num. Peripherals to Model |P¯u |
40 40
20 20
0 0
0 50 100 150 0 10 20 30 40 50 60
Number of SoCs (DTB) with Peripherals Modeled Number of SoCs (SVD) with Peripherals Modeled
(a) Unimplemented peripherals in 195 randomly sampled embedded (b) Unimplemented peripherals in 64 randomly sampled Cortex systems
Linux systems at each round of simulation. at each round of simulation.
Figure 2: Average number of peripherals that must be modeled when building 1,000 (simulated) HESs to support 10% of the embedded
systems from each corpus. For each HES, a system is selected from the relevant corpus, its count of unmodeled peripherals (|Pu |) is
recorded, and it’s unmodeled peripherals are imagined to be modeled for subsequent system selections.
This means that emulating the SoCs in our random sample of the has taken this firmware-centric approach with a variety of strate-
DTB corpus requires approximately four times the number of pe- gies [10, 20, 23, 43, 47, 49, 77, 86]. However, without a standard
ripherals QEMU currently supports. This does not imply QEMU is definition of the underlying process, advances in this space have typ-
a fourth of the way to being viable for embedded emulation; the total ically been ancillary to enabling other research tasks, such as fuzzing
number of manually implemented peripherals, |Pm |, only grows if embedded web applications [20]. We argue that by studying rehost-
we select a larger percentage of the corpora (e.g., 25% or 100%) as ing as a research problem in its own right, the broader research com-
we would be arranging to emulate more systems and, consequently, munity can find more general solutions which will make narrower
encounter even more unimplemented peripherals. Our 10% sample problems much easier to solve.
size was chosen to demonstrate that a leading modern HES can- At present, the process of rehosting is more alchemy than chemistry—
not support even a small subset of either corpus. After two years of opaque, unrepeatable, and prone to failure. We hope to see a fu-
QEMU development, the situation has not improved. Moreover, new ture in which rehosting is a systematic and scientific endeavor made
SoCs with new peripherals are constantly being manufactured, so possible with standard methodology and effective technologies. To
peripheral diversity in-the-wild is likely to increase over time. This that end, this section identifies the goals of rehosting, contextualizes
makes manual implementation an unending, expensive endeavor. prior work, identifies how different classes of embedded systems
To estimate engineering effort for these peripheral implementa- present different rehosting challenges, and examines how reducing
tions, we track driver SLOC for every modeled peripheral in the the scope of a VE can ease the rehosting process.
DTB corpus. Each time an unmodeled peripheral is encountered,
we count its SLOC if source is available. For closed-source drivers, 5.1 Rehosting Goals
we use the architecture-specific average SLOC per driver. The mean Decoupling firmware from its physical dependencies facilitates a
SLOC total corresponding to the all peripherals implemented by the wide spectrum of processes including reverse engineering, training,
end of the simulation is 2,448,354. Hence, we conclude that manual system evaluation and certification, vulnerability research, and ex-
peripheral implementation does not scale and that an HES with sup- ploit development. With each of these use cases, a different popu-
port for the majority of embedded systems will likely never exist. lation of users wish to leverage rehosting towards a different goal.
For instance, hardware and software vendors may wish to use re-
hosting during testing and development of a product. These vendors
5 THE CASE FOR REHOSTING may use rehosting techniques that require expert knowledge of their
From the analyses presented in § 4, it is evident that embedded target hardware platform and manually build a VE. On the other
systems are remarkably diverse and impossible to fully support in hand, third parties who lack detailed information on a target platform
HESs without automation. In spite of these challenges, there is still (e.g., security analysts, reverse engineers, or system integrators) may
a clear need for dynamic analysis of the firmware running on embed- be interested in rehosting for vulnerability discovery, system under-
ded systems which can be accomplished by rehosting. Prior work standing, or system verification. These users need an approach to
693
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
rehosting that does not require expert knowledge or manual imple- 5.2.1 Pure Emulation. The most straightforward, though labor-intensive
mentation effort. Despite their distinct end goals, both sets of users approach, to building a VE is emulating all the necessary compo-
would benefit from research advancements that improve and poten- nents. As previously shown, building complete emulators for every
tially automate the rehosting process. component of an embedded system (i.e., an HES) is difficult and
cannot scale. However, it is still challenging to identify and model
a minimum set of necessary peripherals and features when building
5.2 State of the Art
an RES. To simplify the process, an analyst may choose to ignore in-
Due to the diversity of hardware platforms and dearth of documen- teractions with some peripherals, manually implement models, sub-
tation, it is rarely possible to create an HES that models every layer stitute peripherals with similar peripherals that have been modeled,
of even a single embedded system. As a result, RESs are commonly or fall back on other rehosting strategies.
used as a less precise alternative to enable analyses that produce Ideally, pure emulation approaches would refrain from modifying
meaningful results about the original system. Rehosting systems must the OS, application, or function layers and only replace the hardware
make decisions about how to model each layer of target embedded and physical layers with emulated models to support detection of
systems informed by the desired analysis outcomes. These systems software vulnerabilities (e.g., memory corruption) but not hardware
make different trade-offs regarding which layers should be emulated vulnerabilities. In practice, emulation-based rehosting techniques
precisely, modeled with some approximation, or passed through to often modify the firmware to simplify the rehosting process and,
a physical embedded system. in the process, excise some of the firmware from the VE. Type 1
In Table 4, we systematize existing work, contrasting how rehost- firmware are commonly rehosted by using a generic kernel com-
ing systems handle various abstraction layers of target systems. At a bined with the firmware’s file-system in order to run user space bi-
given layer, a particular system may choose to emulate a component naries [10, 20, 88].
(💻), replace it (◪), model it symbolically (x), pass it through to real With Type-2 and Type-3 firmware, one rehosting technique is to
hardware (🌐), or leave it unmodified (○). allow an analyst to manually define models of peripheral behavior at
We also identify four broad approaches to rehosting: pure emula- higher abstraction layers [15–17, 50, 54]. For example, HALucina-
tion, hardware-in-the-loop emulation, symbolic modeling of periph- tor [15] hooks calls into vendor-specific Hardware Abstraction Layer
erals, and hybrid systems that combine hardware-in-the-loop with (HAL) functions and replaces them with Python approximations of
symbolic peripheral models. We refer readers interested in the his- the requested hardware functionality. An alternative approach is build-
torical relationships between these works to Appendix C. Lastly, a ing peripheral models at lower abstraction layers [29, 37, 57]. For
concurrent survey by Wright et al. [83] provides additional infor- example, P2IM [29] observes MMIO access patterns in order to ap-
mation on rehosting fidelity and deployed analysis techniques for ply a pre-defined behavioral model.
common rehosting tools and approaches.
694
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
5.2.2 Hardware-in-the-Loop Schemes. This approach, often referred On the other hand, there are fewer hardware abstractions present
to as partial emulation, addresses the problem of missing periph- in Type-2 systems and none in Type-3 systems. While the underlying
eral models by forwarding device interactions to the real hardware hardware in these systems is generally simpler, the amount of hard-
or extracting live snapshots from the running device. The software ware modeling required to build a working VE is often higher. No-
layers from a system are moved into a VE and mutated such that OS- tably, to our knowledge, no existing approaches to rehosting Type-2
hardware interactions are passed through to unmodified hardware systems make use of abstractions provided by the OS.
running in the physical world.
This method yields high-fidelity models of the hardware-layer. 6 THE REHOSTING PROCESS
In addition to physical hardware, this approach typically requires
We study the rehosting approaches outlined in § 5.2 to identify com-
debugging access to the original execution environment, which is
mon patterns and articulate a common rehosting process. We model
seldom present by default and often difficult to obtain. Additionally,
rehosting as the process of building a specification for an RES and
the complexity added by the forwarding interface often leads to very
then iteratively evaluating and refining the specification until its fi-
high execution overheads (e.g., latency), limiting support for certain
delity is satisfactory. Although the rehosting process does not funda-
peripheral classes. Hardware-in-the-loop solutions generally do not
mentally require iterative refinement, the information available even
scale since these solutions require a physical system paired with each
in a low-fidelity VE is commonly used as it provides invaluable in-
VE. One notable exception is Pretender [36], which requires hard-
sights into the requirements of an RES.
ware only during a training phase in which peripheral models are
Rehosting an embedded system S begins with an initial specifica-
generated from observations of real hardware behavior.
tion of an RES. A specification R is a 4-tuple that defines a rehosted
While most hardware-in-the-loop systems modify QEMU to use
system and consists of a VXE (cpu), firmware to execute (fw), mod-
it as a VXE [36, 42, 43, 47, 49, 59, 69], relying purely on emula-
els for its n peripherals ({p j | j ∈ 1, 2, 3...n}), and miscellaneous
tion is not a requirement, as demonstrated by Charm [77]. This sys-
configuration data (d).
tem runs Android device drivers for ARM devices in a virtualized
x86 environment and forwards MMIO to a physical device via USB
3.0. This design provides low-latency forwarding and high execu- 6.1 Iterative Refinement
tion speeds, enabling Charm to fuzz device drivers. After the initial specification, R 0 , is created, it can immediately be
subject to detailed observations and analysis, even before its behav-
5.2.3 Symbolic Abstractions. Another method to model hardware ior sufficiently mirrors S. Clearly, such analyses cannot produce mean-
in VEs is to emulate software layers and consider all the values read ingful results with respect to S while these behaviors diverge, but
from hardware to be symbolic. These approaches require a symbolic they can instead be used to guide the rehosting process by reveal-
VXE such as KLEE [7] or S2E [13]. FIE [23], for example, uses ing which component of R 0 led to divergent behavior. For example,
KLEE as a VXE to symbolically execute the firmware while allow- if the VXE (cpu) fails to execute an instruction in fw, errors in the
ing for every valid interrupt to be raised at every instruction. This construction of cpu may become apparent. Alternatively, if an insuf-
technique typically over-approximates hardware capabilities by as- ficient peripheral model returns a value that causes a divergence, a
suming every peripheral is capable of returning the full range of dynamic taint analysis can identify the deficiency in the model. As
possible values, which may lead to false positive analysis results such, iterative evaluation and refinement will greatly aid the gener-
and cause state-space explosion, even for small firmware programs. ation of an accurate RES. Fig. 3 captures this process in detail. To
represent the RES through iterations of this process, we define the
5.2.4 Hybrid Approaches. Some rehosting approaches combine
ith iteration to be:
hardware-in-the-loop schemes with symbolic execution to allow for
more flexible analysis scenarios such as bug finding or reverse engi- j
Ri = (cpui , fwi , {pi }, di )
neering of hardware components.
Inception [18] is one such full-system hybrid solution. It consists
To build the initial specification, R 0 , an analyst must obtain the firmware
of a custom JTAG debugger for near real-time hardware forwarding,
for the system, fw0 . Static analysis of fw0 can identify the ISA for the
a symbolic VXE based on KLEE, and a translator for merging lifted
original CPU, cpu. To execute fw0 , there must be a VXE available
and compiled LLVM bitcode to cope with inline assembly. However,
for its ISA. If none is available, one must be developed, a complex
its implementation is tied to the ARM Cortex-M3 microcontroller
task even if the ISA is well-documented.
and requires the firmware’s source code, constraining its usability.
In addition to a VXE for the ISA of cpu, models for each periph-
eral with which fw0 interacts, p̄ j , must be developed as necessary.
5.3 Effects of Different System Types The aforementioned approaches to peripheral modeling fit into this
Naturally, approaches to crafting VEs differ depending on the tar- model as follows. Pure emulation creates virtual models of peripher-
get system. While Type-1 systems are generally the most complex als comparable to the original peripherals: ∀j : p j ≈ p̄ j . Hardware-
class of embedded system, their operating systems commonly pro- in-the-loop configures cpu to pass interactions with peripherals p j
vide clear hardware abstractions. As a result, applications on these to p̄ j over a debugging channel between cpu and cpu: ∀j : p j = p̄ j ,
systems are self-contained, rarely interacting directly with hardware. sans latency. Symbolic abstractions treat peripheral outputs as un-
In many cases, this means the kernel and drivers can be replaced in constrained symbolic values (∀j : p j ⊃ p̄ j ) and leverage symbolic
order to ease integration with a VE. This approach is often used by execution to build a VE. Hybrid approaches combine techniques
the systems with replaced (◪) OS layers as shown in Table 4. from the prior two approaches to support more flexible analyses.
695
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
Obtain Enumerate Determine Obtain of Ri should become satisfactory if all reasonable system modifica-
Firmware Peripherals ISA Physical Device
tions are considered. Once the rehosting process is complete, the
final R can be saved to use for dynamic analysis.
Generate Initial
( Specification )
j
R 0 : cpu0 , fw0 , {p0 }, d 0 Access Output 7 REHOSTING ROADMAP
Channel
In this section, we identify and discuss rehosting roadblocks that
require addressing. We organize these into a roadmap for future re-
Run Rehosted Documentation &
System: Ri Prior Knowledge search and development with the hope of guiding the community
Update
toward a future in which rehosting is a well-understood, systematic,
Specification and scientific endeavor. We identify the following significant obsta-
Ri → Ri +1 Expected Ground-Truth
Behavior Traces
cles to the rehosting process that could be improved by future work:
Evaluate
Fidelity (1) Building VXEs for new CPUs/ISAs;
Improvements
of Ri (2) Widespread adoption of modeling standards;
Necessary
(3) Handling peripheral behavior;
(4) Quantifying fidelity of a rehosted system; and
Ri (5) Facilitating rehosting for complex systems
Figure 3: The process of rehosting an embedded system. Itera- 7.1 Creating Virtual Execution Engines
tion improves fidelity by updating the RES specification. Building a VE may require an analyst to implement a new VXE.
When documentation or prior knowledge about an ISA is available,
building a VXE is a straightforward, but significant, undertaking.
Without such information, an in-depth analysis of system binaries
While the initial R 0 may not be perfect, cpu0 must be fairly accu- may occasionally enable development of an emulator [28, 44], but
rate for the VE to run fw0 in a meaningful way. By contrast dynamic this is an incredibly challenging task.
analysis can more easily identify incorrectly modeled peripherals or The difficulty of building VXEs, a critical piece of rehosting,
invalid configuration data. While it may be theoretically possible to means that the vast majority of existing work focuses on widely-
build an accurate specification in a single attempt, in practice an it- used and well-documented ISAs. This is reflected in the supported
erative dynamic analysis process is what makes RES construction ISAs shown in Table 4 where the majority of tools focus solely on
feasible. rehosting ARM targets, MIPS targets, or both in their evaluation.
Ri is executed as follows, where i =0 in the first iteration: fwi is PowerPC, MSP430, and 8081 are targeted by only one tool each,
j
run using cpui with peripherals pi and configuration data di . Initial and other commonly deployed architectures—such as Xtensa, AVR,
attempts to use Ri will likely fail to sufficiently mirror S, but since RISC-V, and SPARC—are not represented at all.
rehosted systems are introspectable, the execution can be analyzed Despite the current focus on ARM and MIPS on Linux, the em-
and various traces extracted to aid diagnostics. These traces may bedded systems in the wild are more diverse. While it is difficult
be collected by capturing the execution state at every instruction, to pinpoint the distribution of ISAs and OSs, prior studies provide
system call, or at any other time of interest. Traces from Ri have an estimate by crawling the Internet for firmware images. For in-
more diagnostic value if equivalent traces can be collected from the stance, in the dataset Chen et al. acquired for Firmadyne [10], 82%
real device, S, via an output channel such as hardware debug support of firmware images ran on MIPS, 10% were ARM (neglecting endi-
(e.g., JTAG) or extractable software logs (e.g., Linux ftrace). anness and bitwidth), and approximately 41% of all acquired images
Evaluating the fidelity of Ri with comparisons of ground-truth were based on Linux. Another large-scale study within the same or-
traces or expected behavior from S to observations of Ri is essential der of magnitude [19] reports the acquisition of a dataset in which
for determining when the rehosting process has finished. Access to 63% of the firmware is for ARM devices, 7% for MIPS, and 86% for
the physical device can make assessing fidelity easier. Without the Linux. These numbers differ largely due to changes in source selec-
physical device, assessments can only be approximated using prior tion and processing of the datasets, but both show that a significant
knowledge of the system’s expected behavior. For example, the gate- amount of firmware is not written for ARM and MIPS on Linux.
way functionality (e.g., DHCP, NAT support) of a router’s firmware Given the significant number of embedded devices not running
could be tested, but high-level functional testing is a very coarse Linux on ARM or MIPS, an important research task is to find ways
measure of fidelity. of making it easier to create fast and accurate VXEs. There has been
If the fidelity of Ri is unsatisfactory, it may be improved by mod- some encouraging recent work on automated synthesis of seman-
ifying its components to produce Ri +1 . This might mean correcting tic specifications for specific ISAs such as x86 [34, 39]. TaintIn-
instruction decoding errors in cpui +1 or excising irrelevant parts duce [14] shows that higher level semantics (i.e., taint propagation
of the firmware in fwi +1 . Alternatively, parameters or data in di +1 rules) can be dynamically inferred for an ISA. However, these ap-
j
could be changed or a peripheral model, pi +1 , may be updated. Af- proaches focus on simple instructions, such as arithmetic and ba-
ter making these changes, Ri +1 should be run so that iterative refine- sic logical operations. To create fully-fledged VXEs for real-world
ment can continue. After some number of refinements, the fidelity CPUs, complex instructions that manipulate hidden CPU states (e.g.,
696
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
instructions that change privilege levels) or perform complex high- additional work is necessary to model complex peripherals. Alter-
level tasks (e.g., the AES-NI instructions on x86) may need to be natively peripheral models could be generated by analyzing driver
handled. code using symbolic execution (e.g., Laelaps [8]), fuzzing, or static
Although various languages for describing and specifying CPU analysis.
behaviour can create emulators and simulators (e.g., Sleigh [33], Third, if sufficient instrumentation capabilities are available on a
Sled [66], and Verilog [40]), the process of creating these specifi- target system, peripherals could be probed with inputs and models
cations is manual and error-prone. We believe research into extend- constructed to describe observed outputs. Subramanyan et al. [76]
ing automated synthesis to complex instructions and ISAs found in demonstrated that some cryptographic co-processors could automat-
embedded systems could ease the difficulty of rehosting systems ically be modeled by creating peripheral templates and then using
that use proprietary, legacy, or merely unpopular CPUs and archi- program synthesis from I/O samples to automatically synthesize a
tectures. working peripheral model. This approach, if extended to other com-
mon embedded peripherals, such as UARTs and timers, could greatly
7.2 Widespread Adoption of Modeling Standards ease the burden of peripheral modeling.
A fourth approach is to replace a peripheral with another that is
Of the standard formats commonly used today to describe hardware
already modeled, remove it entirely or replace it with a symbolic
layouts and encode peripheral metadata (e.g., Device Trees [24],
peripheral model. While prior approaches [12, 59, 68] have used
ACPI tables [31], SVD files [2]), none encode hardware behavior. If
symbolic peripheral models, they quickly encounter problems due
such a format was widely adopted, VEs could ingest abstract config-
to state explosion. If analysis indicates that replacing or removing
urations that encode peripheral and CPU behaviors to use as drop-in
a peripheral will have insignificant effects on a system’s behavior,
replacements or shims for full implementations of each.
such a change can be an effective option.
This lack of widespread adoption is not due to a lack of standard-
A final approach is to intercept requests at higher layers of abstrac-
ization. The Open Virtual Platforms (OVP) project provides a collec-
tion that ultimately lead to peripheral interactions and build mod-
tion of APIs for modeling peripheral and VEs as well as a repository
els there. This approach, known as high-level emulation, precludes
of generated models [53]. Components of embedded systems can be
analysis of potentially vulnerable driver code but enables reuse of
modeled using OVP’s APIs in a standardized fashion and consumed
peripheral models when common peripheral interfaces can be iden-
by multiple emulators. OVP has partnered with numerous semicon-
tified across multiple embedded systems [15].
ductor design companies including ARM and MIPS to validate be-
Even with these approaches, peripherals that provide unavailable,
havior of its models. Another notable standard is SystemC [62], a
high-entropy data will often be impossible to sufficiently model. For
hardware modeling platform for behavioral and system levels. Al-
example, a model of a storage controller built without knowledge
though hardware modeled in one of these standards cannot easily
of any the underlying file system data could lead an RES to an un-
be converted into the other, emulators such as QEMU and OVPsim
bounded state if any code from that file system was executed. Regard-
can be integrated with either of these standards [21, 58]. Future re-
less of implementation specifics, new capabilities to handle device
hosting research should leverage these standards to build off existing
peripherals are necessary, given the asymmetry between peripheral
hardware models and to produce results that could be consumed by
diversity in the wild and manually developed peripheral models.
subsequent work.
697
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
Working with an RES that aims to precisely reproduce a physi- systematization, together with our suggested future research direc-
cal system’s behavior could be evaluated in terms of input-output tions, will spawn new lines of rehosting research, well-equipped to
equivalence. Providing an identical set of input states to the two provide the foundation of successful security analysis platforms for
versions of the system and comparing the outputs will reveal in- current and future embedded systems.
formation about the fidelity of the RES. The confidence in this fi-
delity evaluation would depend on the scale and diversity of input ACKNOWLEDGMENTS
states tested as well as the precision of the output comparison. The
output comparison could be made more precise by modifying the The authors wish to thank the following individuals for their con-
systems to produce intermediate outputs through techniques such tributions and support: Lindsey Wang, John Wilkinson, Douglas
as binary rewriting (e.g., Ramblr [81]), or enabling non-standard E. Stetson, William Hedberg, and Greta Lepore. This work was in
logging. Perhaps an ideal technique for collecting intermediate out- part funded by ONR Awards N00014-15-1-2180 and N00014-19-
puts would build on program slicing techniques. Although current 1-2364; the National Science Foundation under Grants No. CNS-
whole-system slicing techniques (i.e., Virtuoso [26]) fail to handle 1916398 and CNS-1942793; NWO 628.001.030 “Tropics” and NWO
inter-process communication, peripheral behavior, or process cre- NWA-ORC InterSect; and a research contract with Siemens AG.
ation, solutions to these shortcomings would enable precise capture DISTRIBUTION STATEMENT A. Approved for public release. Dis-
of every intermediate output. If slices on output buffers of interest tribution is unlimited. This material is based upon work supported
could be extracted from and compared between an RES and its phys- by the Under Secretary of Defense for Research and Engineering
ical counterpart, differences may reveal inequivalencies. under Air Force Contract No. FA8702-15-D-0001. Any opinions,
findings, and conclusions or recommendations expressed in this ma-
terial are those of the author(s) and do not necessarily reflect the
7.5 Rehosting of Complex Embedded Systems views of the Under Secretary of Defense for Research and Engineer-
The current state-of-the-art of rehosting revolves around firmware ing, Office of Naval Research, or the National Science Foundation.
executed on a single CPU and is closely tied to the peripherals asso-
ciated to that CPU. Yet, embedded systems usually consist of more REFERENCES
than one processing unit and cyber-physical systems can easily com- [1] M. Antonakakis, T. April, M. Bailey, M. Bernhard, E. Bursztein, J. Cochran,
prise multiple different CPUs, specialized Digital Signal Processors Z. Durumeric, J. A. Halderman, L. Invernizzi, M. Kallitsis, D. Kumar, C. Lever,
Z. Ma, J. Mason, D. Menscher, C. Seaman, N. Sullivan, K. Thomas, and Y. Zhou.
(DSPs), custom Application-Specific Integrated Circuits (ASICs) Understanding the mirai botnet. In USENIX Security, 2017.
and configurable FPGAs [6, 55, 56]. [2] ARM. System view description. https://www.keil.com/pack/doc/CMSIS/SVD/
html/index.html.
Existing rehosting systems either ignore these components or model [3] N. Artenstein. Broadpwn: Remotely compromising android and ios via a bug in
them as peripherals. However, similar to multi-layer vulnerabilities the broadcom wi-fi chipset, 2017.
discussed in § 2.1, some vulnerabilities may only be observable [4] F. Bellard. Qemu, a fast and portable dynamic translator. In USENIX Annual
Technical Conference, FREENIX Track, 2005.
when the interactions between the components are captured thor- [5] N. Brown. Device trees i: Are we having fun yet? https://lwn.net/Articles/572692/.
oughly in a rehosted system. Hence, we believe future rehosting ap- [6] P. Burgio, C. Alvarez, E. Ayguadé, A. Filgueras, D. Jimenez-Gonzalez, X. Mar-
proaches will need to investigate computing units beyond traditional torell, N. Navarro, and R. Giorgi. Simulating next-generation cyber-physical com-
puting platforms. Ada User Journal, 37, 2016.
general purpose processors, as well as the interaction between mul- [7] C. Cadar, D. Dunbar, D. R. Engler, et al. Klee: Unassisted and automatic genera-
tiple rehosted components. tion of high-coverage tests for complex systems programs. In OSDI, 2008.
[8] C. Cao, L. Guan, J. Ming, and P. Liu. Device-agnostic firmware execution is
possible: A concolic execution approach for peripheral emulation. In ACSAC.
ACM, 2020.
8 CONCLUSION [9] A. Caraceni, F. De Cristofaro, F. Ferrara, S. Scala, and O. Philipp. Benefits of
using a real-time engine model during engine ecu development. Technical report,
Rehosting is an important capability that enables the application of SAE Technical Paper, 2003.
powerful dynamic analysis techniques such as fuzzing and symbolic [10] D. D. Chen, M. Woo, D. Brumley, and M. Egele. Towards automated dynamic
execution to embedded systems. While prior work has attempted to analysis for Linux-based embedded firmware. In NDSS, 2016.
[11] K. Cheng, Q. Li, L. Wang, Q. Chen, Y. Zheng, L. Sun, and Z. Liang. Dtaint:
develop ad-hoc solutions to rehosting in the pursuit of other research detecting the taint-style vulnerability in embedded device firmware. In IEEE/IFIP
goals, we argue that rehosting is a research problem in its own right DSN, 2018.
[12] V. Chipounov and G. Candea. Reverse engineering of binary device drivers with
and, as such, should be approached systematically. revnic. In ACM EUROSYS.
In this paper, we disambiguate the field of rehosting from emula- [13] V. Chipounov, V. Kuznetsov, and G. Candea. S2E: A platform for in-vivo multi-
tion and show that building complete hardware emulation systems path analysis of software systems. In ACM SIGARCH Computer Architecture
News, 2011.
is both unnecessary to enable dynamic analysis of firmware and im- [14] Z. L. Chua, Y. Wang, T. Baluta, P. Saxena, Z. Liang, and P. Su. One engine to
possible to scale. We propose a taxonomy of rehosting strategies, serve’em all: Inferring taint rules without architectural semantics. In NDSS, 2019.
highlighting the differences between preliminary approaches in a [15] A. Clements, E. Gustafson, T. Scharnowski, P. Grosen, D. Fritz, et al. HALucina-
tor: Firmware re-hosting through abstraction layer emulation. In USENIX Security,
systematic fashion. We identify the essential steps in the rehosting 2020.
process and a high-level, iterative process for rehosting embedded [16] A. A. Clements, L. Carpenter, W. A. Moeglein, and C. Wright. Is your firmware
real or re-hosted? a case study in re-hosting vxworks control system firmware. In
systems. Finally, we describe unsolved rehosting challenges and pro- BAR, 2021.
pose a roadmap for future research in this space. [17] Comsecuris. Luaqemu. https://github.com/comsecuris/luaqemu.
By improving the rehosting process, the security community will [18] N. Corteggiani, G. Camurati, and A. Francillon. Inception: system-wide security
testing of real-world embedded systems software. In USENIX Security, 2018.
finally be able to apply decades of dynamic analysis research and [19] A. Costin, J. Zaddach, A. Francillon, and D. Balzarotti. A large-scale analysis of
mature tooling to the world of embedded systems. We hope that this the security of embedded firmwares. In USENIX Security, 2014.
698
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
[20] A. Costin, A. Zarras, and A. Francillon. Automated dynamic firmware analysis at [55] A. Malinowski and H. Yu. Comparison of embedded system design for industrial
scale: a case study on embedded web interfaces. In ACM ASIA CCS, 2016. applications. IEEE transactions on industrial informatics, 7, 2011.
[21] F. Cucchetto, A. Lonardi, and G. Pravadelli. A common architecture for co- [56] P. Marwedel. Embedded and cyber-physical systems in a nutshell. DAC.COM
simulation of systemc models in qemu and ovp virtual platforms. In IEEE VLSI- Knowledge Center Article, 2010.
SoC, 2014. [57] A. Mera, B. Feng, L. Lu, E. Kirda, and W. Robertson. DICE: Automatic emulation
[22] Y. David, N. Partush, and E. Yahav. Firmup: Precise static detection of common of dma input channels for dynamic firmware analysis. To appear at IEEE SP,
vulnerabilities in firmware. ACM SIGPLAN Notices, 2018. 2021.
[23] D. Davidson, B. Moench, T. Ristenpart, and S. Jha. Fie on firmware: Finding vul- [58] M. Monton, A. Portero, M. Moreno, B. Martinez, and J. Carrabina. Mixed sw/sys-
nerabilities in embedded systems using symbolic execution. In USENIX Security, temc soc emulation framework. In IEEE ISIE, 2007.
2013. [59] M. Muench, D. Nisi, A. Francillon, and D. Balzarotti. Avatar²: A Multi-target
[24] Devicetree.org. Device tree specification v0.2. https://www.devicetree.org/ Orchestration Platform. In BAR, 2018.
specifications/, 2017. [60] M. Muench, J. Stijohann, F. Kargl, A. Francillon, and D. Balzarotti. What you
[25] A. Dinaburg and A. Ruef. Mcsema: Static translation of x86 instructions to llvm. corrupt is not what you crash: Challenges in fuzzing embedded devices. In NDSS,
In ReCon 2014 Conference, Montreal, Canada, 2014. 2018.
[26] B. Dolan-Gavitt, T. Leek, M. Zhivich, J. Giffin, and W. Lee. Virtuoso: Narrowing [61] J. Obermaier and S. Tatschner. Shedding too much light on a microcontroller’s
the semantic gap in virtual machine introspection. In IEEE SP, 2011. firmware protection. In USENIX WOOT, 2017.
[27] M. D. Ernst. Invited talk static and dynamic analysis: synergy and duality. In [62] P. R. Panda. Systemc: a modeling platform supporting multiple design abstrac-
ACM SIGPLAN-SIGSOFT PASTE, 2004. tions. In ACM ISSS, 2001.
[28] fail0verflow. Unprogramming: Intro. https://fail0verflow.com/blog/2012/ [63] F. Peng, Z. Deng, X. Zhang, D. Xu, Z. Lin, and Z. Su. X-force: force-executing
unprogramming-intro/, 2012. binary programs for security applications. In USENIX Security, 2014.
[29] B. Feng, A. Mera, and L. Lu. P2im: Scalable and hardware-independent firmware [64] S. E. Quadir, J. Chen, D. Forte, N. Asadizanjani, S. Shahbazmohamadi, L. Wang,
testing via automatic peripheral interface modeling. In USENIX Security, 2020. et al. A survey on chip to system reverse engineering. ACM JETC, 2016.
[30] S. Fleming. Accessing pci express configuration registers using intel chipsets. [65] N. A. Quynh and D. H. Vu. Unicorn: Next generation cpu emulator framework.
Intel White Paper, (321090), 2008. BlackHat USA, 2015.
[31] U. E. F. I. Forum. Advanced configuration and powerinterface specification v6.2. [66] N. Ramsey and M. F. Fernandez. Specifying representations of machine instruc-
https://uefi.org/sites/default/files/resources/ACPI_6_2.pdf, 2017. tions. Transactions on Programming Languages and Systems, 1997.
[32] FTDI. Simplified description of usb device enumeration. https: [67] N. Redini, A. Machiry, R. Wang, C. Spensky, A. Continella, Y. Shoshitaishvili,
//www.ftdichip.com/Support/Documents/TechnicalNotes/TN_113_ C. Kruegel, and G. Vigna. Karonte: Detecting insecure multi-binary interactions
SimplifiedDescriptionofUSBDeviceEnumeration.pdf, 2009. in embedded firmware. In IEEE SP, 2020.
[33] Ghidra. SLEIGH - A Language for Rapid Processor Specification. [68] M. J. Renzelmann, A. Kadav, and M. M. Swift. Symdrive: testing drivers without
[34] P. Godefroid and A. Taly. Automated synthesis of symbolic instruction encodings devices. In USENIX OSDI, 2012.
from I/O samples. In ACM SIGPLAN PLDI, 2012. [69] J. Ruge, J. Classen, F. Gringoli, and M. Hollick. Frankenstein: Advanced wireless
[35] Z. Gui, H. Shu, F. Kang, and X. Xiong. Firmcorn: Vulnerability-oriented fuzzing fuzzing to exploit new bluetooth escalation targets. In USENIX Security, 2020.
of iot firmware via optimized virtual execution. IEEE Access, 2020. [70] F. Saudel and J. Salwan. Triton: A dynamic symbolic execution framework. In
[36] E. Gustafson, M. Muench, C. Spensky, N. Redini, A. Machiry, Y. Fratantonio, SSTIC, 2015.
D. Balzarotti, A. Francillon, Y. R. Choe, C. Kruegel, et al. Toward the analysis of [71] E. J. Schwartz, T. Avgerinos, and D. Brumley. All you ever wanted to know
embedded firmware through automated re-hosting. In RAID, 2019. about dynamic taint analysis and forward symbolic execution (but might have
[37] L. Harrison, H. Vijayakumar, R. Padhye, K. Sen, and M. Grace. Partemu: Enabling been afraid to ask). In IEEE SP, 2010.
dynamic analysis of real-world trustzone software using emulation. In USENIX [72] Y. Shoshitaishvili, R. Wang, C. Hauser, C. Kruegel, and G. Vigna. Firmalice-
Security, 2020. automatic detection of authentication bypass vulnerabilities in binary firmware.
[38] G. Hernandez, F. Fowze, D. J. Tian, T. Yavuz, and K. R. Butler. Firmusb: Vetting In NDSS, 2015.
usb device firmware using domain informed symbolic execution. In ACM SIGSAC, [73] O. Shwartz, A. Cohen, A. Shabtai, and Y. Oren. Shattered trust: When replace-
2017. ment smartphone components attack. In USENIX WOOT, 2017.
[39] S. Heule, E. Schkufza, R. Sharma, and A. Aiken. Stratified synthesis: Automati- [74] D. Song, F. Hetzelt, D. Das, C. Spensky, Y. Na, S. Volckaert, G. Vigna, C. Kruegel,
cally learning the x86-64 instruction set. In ACM SIGNPLAN PLDI, 2016. J.-P. Seifert, and M. Franz. Periscope: An effective probing and fuzzing framework
[40] IEEE Computer Society. Std 1364: IEEE Standard for Verilog Hardware Descrip- for the hardware-os boundary. In 2019 NDSS, 2019.
tion Language. 1995. [75] P. Stewin and I. Bystrov. Understanding dma malware. In DIMVA, 2013.
[41] M. J. Jung and T. Ballo. Stm-based introspection. Technical report, Sandia Na- [76] P. Subramanyan, Y. Vizel, S. Ray, and S. Malik. Template-based synthesis of
tional Lab.(SNL-NM), Albuquerque, NM (United States), 2017. instruction-level abstractions for soc verification. In FMCAD, 2015.
[42] M. Kammerstetter, D. Burian, and W. Kastner. Embedded security testing with [77] S. M. S. Talebi, H. Tavakoli, H. Zhang, Z. Zhang, et al. Charm: Facilitating dy-
peripheral device caching and runtime program state approximation. In SECUR- namic analysis of device drivers of mobile systems. In USENIX Security, 2018.
WARE, 2016. [78] O. Thomas. Integrated circuit reverse engineering and code dumping, 2019.
[43] M. Kammerstetter, C. Platzer, and W. Kastner. Prospect: peripheral proxying [79] R. Torrance and D. James. The state-of-the-art in ic reverse engineering. In CHES.
supported embedded code testing. In ACM ASIA CCS, 2014. Springer, 2009.
[44] P.-H. Kamp. The crypto-cs-seti challenge: An un-programming chal- [80] S. Vasile, D. Oswald, and T. Chothia. Breaking all the things-a systematic survey
lenge. http://web.archive.org/web/20160304030848/http://queue.acm.org/ of firmware extraction techniques for iot devices. In Springer CARDIS, 2018.
unprogramming.cfm, 2012. [81] R. Wang, Y. Shoshitaishvili, A. Bianchi, A. Machiry, J. Grosen, P. Grosen, et al.
[45] U. Kargén and N. Shahmehri. Speeding up bug finding using focused fuzzing. In Ramblr: Making reassembly great again. In NDSS, 2017.
ACM ARES, 2018. [82] J. Wetzels. The rtos exploit mitigation blues. https://hardwear.io/document/rtos-
[46] M. Kim, D. Kim, E. Kim, S. Kim, Y. Jang, and Y. Kim. Firmae: Towards large- exploit-mitigation-blues-hardwear-io.pdf, 2017.
scale emulation of iot firmware for dynamic analysis. In ACSAC. ACM, 2020. [83] C. Wright, W. A. Moeglein, S. Bagchi, M. Kulkarni, and A. A. Clements. Chal-
[47] K. Koscher, T. Kohno, and D. Molnar. SURROGATES: Enabling near-real-time lenges in firmware re-hosting, emulation, and analysis. ACM CSUR, 2021.
dynamic analyses of embedded systems. In Usenix WOOT, 2015. [84] T.-C. Yeh, G.-F. Tseng, and M.-C. Chiang. A fast cycle-accurate instruction set
[48] K. P. Lawton. Bochs: A portable pc emulator for unix/x. Linux Journal, 1996. simulator based on qemu and systemc for soc development. In IEEE MELECON,
[49] H. Li, D. Tong, K. Huang, and X. Cheng. Femu: A firmware-based emulation 2010.
framework for soc verification. In IEEE/ACM/IFIP CODES+ISSS, 2010. [85] H. Ying, Y. Zhang, L. Han, Y. Cheng, J. Li, X. Ji, and W. Xu. Detecting buffer-
[50] W. Li, L. Guan, J. Lin, J. Shi, and F. Li. From library portability to para- overflow vulnerabilities in smart grid devices via automatic static analysis. In
rehosting:natively executing microcontroller softwareon commodity hardware. In IEEE ITNEC, 2019.
NDSS, 2021. [86] J. Zaddach, L. Bruno, A. Francillon, and D. Balzarotti. Avatar: A Framework to
[51] G. Likely. Linux and the device tree: The linux usage model for device tree data. Support Dynamic Security Analysis of Embedded Systems’ Firmwares. In NDSS,
[52] Y. Liu, H.-W. Hung, and A. A. Sani. Mousse: a system for selective symbolic 2014.
execution of programs with untamed environments. In ACM EuroSys, 2020. [87] V. A. Zakharov. The equivalence problem for computational models: decidable
[53] I. S. Ltd. Openvirtualplatforms. http://www.ovpworld.org/, 2019. and undecidable cases. In Springer MCU, 2001.
[54] D. Maier, L. Seidel, and S. Park. Basesafe: Baseband sanitized fuzzing through [88] Y. Zheng, A. Davanian, H. Yin, C. Song, H. Zhu, and L. Sun. Firm-afl: High-
emulation. In ACM WiSec, 2020. throughput greybox fuzzing of iot firmware via augmented process emulation. In
USENIX Security, 2019.
699
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
Table 5: The embedded systems zoo. a compatible property are ignored. This conservative approach en-
sures we only count peripherals we are certain an OS can interact
System CPU OS Notable Peripherals with directly.
Canon Powershot ARM DryOS Image sensor, For the SVD corpus identifying unique peripherals is less straight-
G11 HDMI forward as peripheral names need not directly correspond to driver
Gen. 1 Apple PPC VxWorks WiFi, Ethernet, code. We consider two peripherals to be the same if they have the
Airport Express USB same register layout within the memory mapped I/O (MMIO) inter-
Google Nest ARM Linux WiFi, face to the peripheral, raise the same interrupt signals, and have sim-
Thermostat E Temp. sensor ilar names (normalized Levenshtein distance ≤ 0.20). Again, this is
HP M551dn ARM WinCE Ethernet, motor, a fairly conservative approach as divergences in register layouts and
Printer daughterboard interrupts necessitate different peripheral behavior but similarities
Microtik MIPS RouterOS Ethernet, do not guarantee that two configurations refer to the same periph-
RouterBoard 192 speaker eral.
Philips Hue AVR Custom ZigBee and Because QEMU has no central table of supported peripherals, we
Lightbulb Bluetooth radios programmatically collect and clean the output of the -device help
flag for all board definitions for every surveyed architecture. OVPsim
peripheral counts are determined by available models advertised on
A EXAMPLE EMBEDDED SYSTEMS the project’s homepage [53].
Table 5 describes embedded systems, their CPU architecture, oper-
ating system, and notable peripherals from each system. B.3 Complexity: Driver SLOC as a Proxy
Note that submission of Device Tree files to the Linux kernel does
B DETAILED SURVEY METHODOLOGY not obligate submission of source code for drivers named therein; we
B.1 Availability: Execution Engines observed that 20-58% of drivers were open source, depending on ar-
Our DTB corpus contains “manufacturer,model” tuples describing chitecture. When totaling SLOC per SoC, we use precise counts for
the CPU used by each hardware platform. For example, the DTB open-source drivers and the average SLOC per driver count (architecture-
for the exynos5250 SoC indicates that it uses an ARM Cortex-A15 specific) for closed-source drivers.
CPU (compatible = "arm,cortex-a15";). After we strip manu- Manual analysis of a small subset (n = 20) of QEMU-implemented
facturer name, this corresponds to QEMU’s cortex-a15 CPU. peripherals demonstrated a positive correlation between driver and
Because strings describing the same CPU model may differ slightly, peripheral implementation SLOC as shown in Fig. 4.
we used fuzzy string matching (Levenshtein distance) after strip-
ping manufacturer name from compatible strings to make proces-
sor support determinations, e.g., QEMU’s mpc8541e_v11 processor
name matches the manufacturer-less PPC compatible string 8541.
Matches were manually reviewed for accuracy. Due to QEMU’s ex-
tremely limited support for ARM Cortex-M processors and SoCs
(only 2 SoCs), we do not consider the systems described by our SVD
corpus for this analysis.
We never assume that a core can be safely swapped for another of
the same ISA version, e.g. replacing a cortex-a9 with a cortex-a15
- both ARMv7-A CPUs. Despite QEMU’s lack of micro-architecture
behavior modeling, such a substitution can lead to a range of errors
- including illegal instructions, differing sets of configuration regis-
ters, and variations in MMU features.
B.2 Diversity: Unique Peripherals Figure 4: SLOC of Peripheral Model vs Device Driver.
For our DTB corpus, we extract compatible strings from each sys-
tem’s description. The Device Tree Standard indicates that the compatible
property of a node, a key whose value is a list of precedence-ordered Unlike our tabulation of peripheral diversity, here we consider
strings in the format "manufacturer,model", should be used for de- all compatible strings present, not just the first (preferred) per node.
vice driver selection. For example, when the Linux kernel parses a This aids completeness, as we do not miss the opportunity to count
device tree node with property compatible = "samsung,exynos3250- SLOC for any open drivers. SLOC is computed with pygount, a
pmu" it determines that it must load the device driver implemented Python library that supports C syntax and does not count comments
in drivers/soc/samsung/exynos-pmu.c. If a node’s compatible or empty lines.
property lists multiple strings, we consider only the first. This re- Automating SLOC measurement for QEMU peripheral implemen-
flects the kernel’s selection precedence and ensures no physical pe- tations is infeasible as they can be incomplete, tightly coupled with
ripheral is counted more than once. DTB nodes that do not contain QEMU-internal objects, or spread across hierarchical source files.
For example, hw/cpu/a9mpcore.c extends QEMU’s CPU class to
700
Session 7B: Software Security and Vulnerability Analysis (II) ASIA CCS ’21, June 7–11, 2021, Virtual Event, Hong Kong
System Focus: Rehosting Other Targeted Devices: Desktop Type-I Type-II Type-III Type of code: T Source T Binary
Hybrid Approaches
Inception
Mousse
Avatar
Symbolic Abstractions
FIE FirmUSB
KLEE S2E
Firmalice angr Laelaps
Hardware-in-the-loop Emulation
Kammerstetter Frankenstein
Prospect Charm Pretender
FEMU 16
Surrogates Avatar2 FirmCorn
Emulation-only
HALucinator Clements 21
Unicorn BaseSafe
Firmadyne FirmAE
Costin 16 LuaQemu FirmAFL P2IM DICE
QEMU PartEmu Li 21
<2010 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Table 6: Total Peripherals Supported by QEMU. Regardless of which QEMU version we contrast § 4 results against,
the outcome is the same. Modern QEMU is not meaningfully more
v2.11.1 v4.2.0 v5.2.0 capable of emulating embedded systems than it was 2.5 years ago.
Arch Total Total Total
Despite the increasing attention the research community has given
ARM 227 321 337
rehosting, on the HES front there has been no meaningful change.
ARM64 279 322 338
MIPS 153 186 192
Our Monte Carlo simulation demonstrated the problem of robust
PPC 160 210 216 peripheral support is intractable going forward, historical data com-
plements this conclusion by demonstrating an insignificant rate of
support increase.
add support for multiple Cortex-A9 peripherals (GIC, SCU, timers,
etc.). C HISTORICAL TAXONOMY OF PRIOR
B.4 Aside: Peripheral Support across QEMU WORK
Versions Table 4 does not capture temporal or evolutionary relationships be-
tween prior work. Toward this end, we present a timeline of rehost-
Looking at three major versions of QEMU approximately a year
ing solutions and rehosting-related work in Fig. 5. Note that tar-
apart, v2.11.1 (February 2018, latest in Ubuntu 18.04 repositories),
get system type, source dependecy, and primary goal (rehosting or
v4.2.0 (December 2019), and v5.2.0 (December 2020), Table 6 shows
other) are encoded in the figure.
very little increase relative to corpus peripheral diversity (see Table 2
in § 4).
701