Undervolting ARM Processors
Undervolting ARM Processors
Undervolting ARM Processors
Christian Göttel∗ , Konstantinos Parasyris† , Osman Unsal‡ , Pascal Felber∗ , Marcelo Pasin∗ , Valerio Schiavoni∗
† Lawrence Livermore National Laboratory, [email protected]
‡ Barcelona Supercomputing Center, [email protected]
∗ Université de Neuchâtel, Switzerland, [email protected]
Abstract—Latest ARM processors are approaching the com- Table I: List of server-grade and mimicking ARM processors
putational power of x86 architectures while consuming much with their supported ISA. ‘*’: used in our evaluation (see §V).
arXiv:2107.00416v2 [cs.DC] 2 Jul 2021
©2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists,
or reuse of any copyrighted component of this work in other works. Presented in the 40th IEEE International Symposium on Reliable Distributed Systems
(SRDS ’21).
applications (see our threat model in §III). The main research 3B 3B+ 4B
Energy (normalized)
questions we address in this work are: 1.05
RQ2: Does a cloud user have the ability to uncover such an 0.9
undervolting strategy? 0.85
0 50 100 150 200
To answer those questions, we need to lay the foundation Throughput [Mop/s]
to better understand consequences of (arbitrary) undervolting,
both from the cloud provider and client perspective. In fact, Figure 1: Normalized energy to throughput ratio (ETR) for
depending on supply voltage, frequency, load, and temperature undervolted Raspberry Pi model B platforms operating at
of the CPU, execution steps can yield erroneous computa- maximum throughput
tions. While recent attacks [17, 18] have demonstrated how
undervolting can be effectively exploited to gain access to
sensitive information, we deal with a different threat model: normalized ETR values indicate higher energy efficiency for
the infrastructure is undervolted on purpose by a powerful a given throughput. On average across different throughput
attacker (i.e., the cloud provider), at the risk of exposing hard- values we achieved by undervolting 5 % to 13 % better energy
to-detect unreliable computing instances for users. Without efficiency on the 3B and 3B+ and 0 % to 3 % on the 4B. In
physical access to instances, nor being able to directly manip- essence, these results suggest that a cloud provider can indeed
ulate the supply voltage or frequency, a user’s options remain undervolt ARM-based instances, without directly compromis-
limited. Nevertheless, a user can adjust the processor’s load ing the observed performance.
and operating performance points (§II-B) to influence its heat Our contributions are as follows:
dissipation. In order to operate under full load, the processor • We describe a novel attack scenario based on undervolt-
has to be set to the highest operating performance point, which ing by a scrooge cloud provider to lower energy costs.
implies the highest frequency and supply voltage setting. Con- • We demonstrate how cloud users can with a certain
sequently, undervolted processors present higher probability probability detect this novel scrooge attack.
for erroneous computations to occur because they are unable to • We provide a temperature-based guardband analysis to
maintain high frequencies. This probability is further increased narrow down the operation voltage range of an ARM-
by the propagation delay due to high operating temperature. based processor (§V-D).
If erroneous computations result in faults, one can observe • We describe how our analysis can be used to automati-
application crashes, or kernel panics, leading to cloud instance cally identify undervolted instances (§V-E)
unavailability. While service level agreements (SLA) [19] • We present potential energy gains of undervolting sys-
typically cover such scenarios, a malicious provider might try tems using a reliability benchmark (§V-F). In general
to balance its actions to only yield erroneous computations gains can reach up to 37 %.
not resulting in faults, basically overcoming SLA protections.
This practical experience report is organized as follows.
For this reason, we designed a non-selective fault injection
Section II provides background on the low-level mechanisms
method for detecting the scrooge attack. The sole purpose of
used to undervolt a processor and the Raspberry Pi platform
the detection method is to yield intentional application crashes
as well as the associated side-effects. Our threat model is
or kernel panics on undervolted instances such that the user
given in Section III. We overview our detection method in
is covered by the SLA. While interesting, we consider cloud
Section IV. Our in-depth experimental evaluation is presented
providers or users exploiting undervolting to leak sensitive
in Section V. We discuss and review related work in Section VI
information [20, 21] to be out of scope of this work.
and Section VII, before concluding in Section VIII.
Interestingly, ARM-based Raspberry Pis have already been
collocated in cloud data centers [22]. With the intent to II. BACKGROUND
reproduce and study the dynamics of such deployments (and,
to a smaller scale, mimic AWS using ARM nodes), we first This section defines more precisely a few concepts related to
study the effects of undervolting on three different ARM pro- power management (§II-B), i.e., frequency and voltage scaling
cessors, focusing on energy savings. Figure 1 shows different and associated techniques such as Dynamic Voltage and Fre-
normalized energy to throughput ratios (ETR) [12] obtained quency Scaling (DVFS) and Adaptive Voltage Scaling (AVS).
with ARM Cortex-A processors for the three latest Raspberry In §II-D we explain the relation between such techniques and
Pi models (3B, 3B+, and 4B [9]) at their lowest operational how they affect the overall reliability of a system.
undervolting setting (−75 mV for 3B and 3B+, and −15 mV
for 4B) compared to nominal voltage (i.e., 0 mV, no undervolt- A. ARM in data centers
ing). As shown, undervolting directly influences energy spent Collocation offers allow users to either ship or buy Rasp-
per operation, without negatively affecting throughput. Lower berry Pis in order to deploy lightweight workloads on this
2
CPU-bound Memory-bound Frequency scaling regulates (dynamically) the frequency of
an integrated circuit in order to change performance, conserve
2
Energy [J/op]
Energy [J/op]
power or reduce the amount of heat dissipation. Reducing
0.2
the frequency at a constant voltage is called underclocking or
1 throttling, while increasing the frequency is called overclock-
0.1
ing. The dynamic power dissipated by an integrated circuit
over a period of time is given by P = CV 2 f , where C is the
0 0
capacitance, V is the voltage, and f is the frequency. Thus,
n
n
rt e
rt e
+
+
3B B
4B
K oad C
3B B
4B
K oad C
ar L l
ar L l
H y el
H y el
p e ak
pe ak
ow
ow
r Y
r Y
3
3
ab w
ab w
increasing the frequency results in a higher power consumption
B P
B P
E
E
and operating temperature.
Voltage scaling is an open loop system, in which the
Figure 2: Energy comparison of off-the-shelf and server-garde voltage of an integrated circuit is regulated (dynamically)
devices on CPU-bound and memory-bound workloads. based on an external setting. Increasing or decreasing the
voltage while keeping the frequency constant is called over-
volting and undervolting, respectively. Regulating the voltage
low-energy hardware and thus free up resources on high- enables increasing the frequency or conserving power of an
energy x86 hardware. Furthermore, Raspberry Pis are the integrated circuit, a particularly useful aspect especially for
size of credit cards and have much lower cooling demands, battery-powered devices. Changing the voltage influences the
which allows hosting a large number of units in a single rack. rate at which capacitances can be charged and discharged.
Such off-the-shelf hardware setups allow for large-scale node Thus voltage determines the speed and frequency at which an
deployments as needed in data processing or cloud computing integrated circuit can be operated. Modern operating systems
workloads. While off-the-shelf hardware typically lacks in do not provide direct support to adjust a processor’s voltage
performance and storage capability, its energy consumption individually. The processor’s voltage is either regulated by
remains comparable to server hardware. model-specific registers [26] or through firmware.
Figure 2 compares the energy consumption of ARM- DVFS is the simultaneous software-controlled regulation
based off-the-shelf hardware (i.e., three different Raspberry of voltage and frequency scaling of an integrated circuit.
Pi models) against server-grade hardware using different x86 Depending on the process variation (variation of integrated
architectures. We run a cryptographic (CPU-bound) and a circuits when fabricated) ARM system on a chip (SoC) manu-
memory allocation (Memory-bound) stressor while measuring facturers specify a set of operating performance points (OPPs)
the entire device power consumption. The x86 processors under worst case conditions. These OPPs are pairs of clock
used were an AMD EPYC and three different Intel Xeon frequencies and voltages under which the integrated circuit
processor generations, i.e., Broadwell, Kaby Lake and Harper- is operational with a sufficiently large margin while taking
town. This is a direct comparison of the execution of two into account thermal conditions. In Linux the CPUFreq kernel
distinct binaries of the same source code on two different driver [27] will chose a set of OPPs based on a specified
architectures based on a common metric (J/op). We observe governor. DVFS has been extensively studied [25, 28, 29] to
no major difference for CPU-bound operations between dif- accelerate multi-threaded applications. x86 manufacturers use
ferent architectures [23]. However, memory-bound operations their own DVFS implementations [30, 31, 32].
on off-the-shelf hardware have higher energy consumption. AVS [33] is a closed loop system where the voltage is
In the case of the Raspberry Pi models, these are due to regulated based on its process variation, aging and a feedback
cache size and memory transfer rate. Nevertheless, off-the- loop of sensor data. A hardware monitor or software backed by
shelf hardware achieves lower energy consumption for both sensor data determines if the changes made to the system are
operations compared to older server-grade x86 hardware, i.e., sufficient or if additional changes are necessary. AVS requires
Harpertown. These results indicate that replacing old x86 support from both the processor and the power regulators,
hardware with recent off-the-shelf ARM-based nodes in data in order to adjust the voltage accordingly. The Raspberry Pi
centers will result in energy savings. models B used in this report are equipped with an AVS system.
C. Raspberry Pi
B. Power management
The Raspberry Pi’s firmware is configured at boot time by
The power dissipated by an integrated circuit depends on a text file containing property-value pairs. For example, the
static power (leakage current) and dynamic power (switching frequency and voltage can be set in this configuration file.
power). Since about 2005 [24] the power dissipation con- A particularity is that voltages can only be set to a nominal
tribution of dynamic power has become much higher than offset in steps of 25 mV. This configuration is then parsed
static power. Nowadays, with the decreased transistor size and by the firmware. This undervolting configuration is specific
lowered threshold voltages, static power is becoming more and to the Raspberry Pi and other hardware can more easily be
more important [25]. In the following, we outline techniques undervolted dynamically at runtime. Notice that the requested
to reduce dynamic power. CPU frequency in the operating system can deviate from
3
the actual frequency regulated by the firmware. This is in ensures that the undervolted state of the cloud infrastructure
particular the case if the device reaches the thermal hard limit remains oblivious to users. A cloud provider must find the
at 85 °C. Additionally, the 3B+ has a soft limit temperature at sweet spot [16] for the undervolt configuration in or near the
60 °C that will throttle the CPU frequency and voltage. critical region to provide sufficiently stable instances.
4
boot reboot voltage reading Users can for this reason make use of simple CPU-bound
programs that will put the processor under maximum load
deploy ❶ ❷ ❶ ❹ ❺ while monitoring for faults. Inspired by [17] we propose
❸
shutdown deployed firmware request implementing an arithmetic computation (i.e., multiplication)
❷
❻
for which we can validate the result. First we generate two
random numbers which are then multiplied until the instance
Figure 3: State machine with cloud provider actions to obfus- crashes while alternating the position of multiplier and multi-
cate undervolted machine configuration. plicand. Murdock et al. have observed, that the position of
the multiplier and the multiplicand can lead to a faulting
instruction. After each multiplication the result is compared to
or shutdown of the machine configurations might have to be
the original result. While the processor operates at maximum
swapped back · again.
load it will run at the highest frequency and dissipate heat
Any CPU voltage reading request needs to be intercepted
which will raise its temperature. Under these conditions we
and substituted by a plausible nominal voltage value. This
achieve the highest probability to inject faults related to timing
will typically involve a kernel driver that will handle the
violations. Depending on the complexity of the RISC circuitry
communication with the firmware or accessing model-specific
in the ARM processor, certain instructions are more likely to
registers. The request can then be intercepted directly in the
fault then others. To this end, the detection method might not
user space tool or the kernel driver ¸. From the kernel driver
inject faults in its own computation (due to its simple nature),
the request is forwarded ¹ and the actual undervolted CPU
but more likely in other processes. This behavior is favorable,
voltage value is returned to the kernel driver º. The kernel
as it allows to run the detection method until the instance
driver then substitutes this value by a some nominal voltage
itself becomes unavailable due to multiple critical faults in
value, e.g., by adding the undervolt offset to the value. The
system relevant processes. Thus, we detect an undervolted
alleged nominal voltage value is then returned to the user ». A
cloud instance using the detection method by gradually failing
more costly but stealthier variant involves the trusted operating
processes to crash the instance and make it unavailable.
system, to which the cloud provider could delegate voltage
The detection method depends strongly on how aggressively
reading requests instead of a kernel driver.
machines are undervolted and the cooling system employed
If users are allowed to deploy their own kernels, then the
by the cloud provider. The less a machine is undervolted, the
cloud provider needs a different approach. Voltage reading
higher the temperature needs to be raised by the detection
request can no longer be intercepted in kernel space. Instead,
method to fault processes and vice versa. A good cooling
the cloud provider needs to use the hypervisor to intercept
system is a lesser problem than a weakly undervolted machine.
CPU voltage requests and substitute them similarly to the
With a good cooling system the detection method requires a
kernel driver approach.
longer time to raise the processor’s temperature. On the one
In our threat model we assume that the cloud provider will
hand implementing a soft limit temperature throttle in order to
make use of these mechanisms and obfuscate as much as
prevent this detection method is not an ideal solution. Users
possible the undervolted state of the infrastructure from users,
are less inclined to pay for a service which underperforms
a practical effort with significant benefits. Without access
compared to alternative services. On the other hand weakly
to the firmware configuration nor any untampered message
undervolting machines defies the scrooge cloud provider’s
exchange with the CPU voltage regulating mechanism, a user
original idea of minimizing the electricity bill.
can never be sure to obtain a genuine voltage reading.
The cloud provider’s options are limited to completely pre-
D. Relevance of discussed techniques vent the detection method from unveiling the scrooge attack.
While a scrooge cloud provider has powerful mechanisms Even the powerful setup of the cloud provider to tamper with
in place to hide its undervolted instances, the curious user can CPU voltage readings is not sufficient denying the detection
still expose this misbehaviour. For instance, the processor’s method. The scrooge attack has the disadvantage that detection
frequency and package temperature are viable options to methods have a simple design, but it has the advantage that
test for undervolted conditions. The techniques presented in proving the undervolt state without the firmware is difficult.
Section V demonstrate to which extent users can deploy
V. E VALUATION
applications stressing aforementioned options on instances and
how accurately conclusions can be drawn. In this section we explore the behavior of Raspberry Pi pro-
cessors under different nominal and undervolted setups. The
IV. S CROOGE ATTACK DETECTION information gained from these experiments allows quantifying
We describe the user’s detection method in this section. the attack parameters and determining the type of processes to
Furthermore, we describe under which conditions the detection use for the detection method. Then, we derive the probability
method works and where difficulties may arise. at which our detection method can successfully uncover the at-
We assume that users cannot trust any firmware or system tack. We begin by describing our experimental setup to under-
reading on instances. As such, users have no reference to any volt Raspberry Pis before evaluating the firmware’s throttling
parameters for adjusting the detection method to the attack. behavior when reaching the soft limit and limit temperatures.
5
Table II: Soft limit (SL) firmware throttling on the 3B+ C. Limit temperature throttling
OV level Varm [V] VSL
arm [V] farm [MHz] fSL
arm [MHz]
0 1.3750 1.2688 1400 1200 Next, we evaluate the firmware behavior when reaching the
-1 1.3500 1.2375 1400 1200 limit temperature while running under the CPUFreq perfor-
-2 1.3188 1.2125 1400 1200 mance governor. At the limit temperature, the firmware will
-3 1.2938 1.1875 1400 1200
throttle the processor to prevent thermal runaway. Notice that
model 3B+ is not included here, as it is taking too much time
Table III: Limit temperature (L) throttling on the 3B and 4B
reaching the limit temperature while already being throttled for
Model VL
arm [V] fL
arm [MHz] fL
core [MHz] going beyond the soft limit temperature. Neither the 3B nor
3B 1.2813 {1034, 1087, 1141, 1195, 1200} {400} the 4B reduce the voltage when reaching the limit temperature
4B 0.8500 {1000, 1500} {333, 500}
as shown in Table III. However, both models reduce their
frequency. The 3B is reducing its ARM CPU frequency fLarm
in steps of about 54 MHz (except for the first step) while the
The temperature-based guardband analysis allows detecting
4B significantly reduces its frequency by 500 MHz. In addition
the critical region of the device and defines the margin for
the 4B also reduces its GPU frequency fLcore by 167 MHz. We
an undervolt setup. Faults that occurred during the guardband
find that reaching the limit temperature will reduce the load
analysis are analyzed to describe the fault injection of the
put on the processor by the detection mechanism and reduce
detection method. Finally, we measure the energy efficiency
its temperature which lead to a lower fault injection rate.
of the undervolted hardware with a reliability benchmark. The
dataset gathered for this evaluation will be made publicly
available at https://github.com/ChrisG55/Scrooge-Attack. D. Temperature-based guardband analysis
6
safe critical failure nominal safe critical failure nominal safe nominal undervolted
1.30
Voltage [V]
Voltage [V]
Voltage [V]
0.86
1.30
1.25
0.84
1.20 1.20
0.82
30 40 50 60 70 80 30 40 50 60 70 30 40 50 60 70 80
Temperature [°C] Temperature [°C] Temperature [°C]
3B 0mV 3B -75mV 3B -100mV oops diagnose in the system log. We used this information to
3B+ 0mV 3B+ -75mV 3B+ -100mV analyze the guardband failures and summarized it in Figure 5.
1 Notice that the 4B is not included, as it’s firmware does not
0.8 provide undervolting support and we could not provoke any
Failure rate
7
Table IV: STRESS - NG ETR heat map indicating the relative energy efficiency for an undervolted setup compared to a nominal
setup. The darker the shade, the more energy-efficient the stressor ran.
m eso r
g e
lt
sg r t
er ri
vm dom
vo
hs ers
m ba r
M ling
cl ch
i c ch
sy gv
m ch
ur ch
bs ic
w w
el
he
er
p
k
tim
r
er
tex
om
em
se
-r
od
ar
ar
ea
ea
dy
an
nd
pe
o
oc
rk
ac
sf
ll
cs
ll
tim
o
t
Co
sig
lse
tse
po
ge
kc
fo
fu
hr
se
ju
ki
pi
U
ai
at
3B −75 mV 0.94 0.95 0.96 0.95 0.92 1.02 1.03 0.90 0.95 0.95 0.99 0.93 0.93 0.94 1.02 0.94 0.91 0.96 0.95 0.93 0.94 0.91 0.94 0.99 0.95 0.96 0.94
active
3B+ −75 mV 0.89 0.94 0.93 0.93 0.87 0.99 0.94 1.00 0.95 0.94 1.01 0.93 0.96 0.94 0.97 0.94 0.95 0.92 0.95 0.93 0.93 0.92 0.94 0.95 0.95 0.97 0.94
4B −15 mV 1.01 0.99 1.02 0.99 1.02 0.98 1.00 1.06 1.00 0.98 0.96 1.00 1.04 0.99 0.96 1.00 0.97 0.70 0.98 0.98 0.99 1.00 0.97 0.91 0.98 1.00 0.99
3B −75 mV
passive
0.88 0.95 0.92 0.93 0.91 0.66 0.76 0.63 0.94 0.94 0.97 0.93 0.94 0.94 0.93 0.93 0.92 1.03 0.93 0.95 0.92 0.95 0.92 0.96 0.94 0.96 0.93
3B+ −75 mV 0.95 0.95 0.96 0.95 0.98 0.97 0.99 0.80 0.95 0.95 0.94 0.95 0.95 0.95 0.98 0.95 0.95 0.96 0.94 0.95 0.94 0.91 0.95 0.97 0.96 0.97 0.95
4B −15 mV 1.00 0.97 1.02 0.99 1.01 0.84 1.00 1.10 0.99 1.03 0.99 0.98 1.00 1.00 0.97 1.00 1.03 1.01 1.00 1.05 1.04 0.87 1.00 0.91 0.99 0.97 0.99
20 15
10 10
Frequency
Frequency
Frequency
Frequency
10
5 10 5
5
0 0 0 0
0 200 400 600 800 0 20 40 60 0 200 400 600 0 20 40 60 80
Run-time [s] Temperature [°C] Run-time [s] Temperature [°C]
(a) Bare-metal 3B run-time (b) Bare-metal 3B temperature (c) Kubernetes 3B run-time (d) Kubernetes 3B temperature
10 15 20
15
Frequency
Frequency
Frequency
Frequency
10 10
5 10
5 5
0 0 0 0
0 500 1,000 1,500 0 20 40 60 0 500 1,000 1,500 0 20 40 60
Run-time [s] Temperature [°C] Run-time [s] Temperature [°C]
(e) Bare-metal 3B+ run-time (f) Bare-metal 3B+ temperature (g) Kubernetes 3B+ run-time (h) Kubernetes 3B+ temperature
Figure 6: Run-time and temperature histograms of bare-metal and container instances
stressors 5 % / 9 % (active/passive) were saved on the 3B, 6 % 62 °C to crash bare-metal or container instances. Interestingly,
/ 6 % were saved on the 3B+, and 2 % / 1 % were saved on container instances crash on the 3B earlier than bare-metal
the 4B. The highest energy efficiency observed on the 3B was instances. We assume the computing requirements from the
−10 % / −37 % on the hrtimers stressor. On the 3B+ −13 % container environment work in favor of the detection method.
/ −20 % were saved on the fork / hrtimers stressor. Finally,
the −30 % / −16 % were saved on the 4B with the hrtimers /
futex stressor. VI. D ISCUSSION
G. Detection method parameters From our evaluation we conclude that the detection method
In this subsection we quantify the detection method parame- is best used in combination with other processes such as in
ters (i.e., run-time and temperature) based on undervolted bare- STRESS - NG . The user even has the option to scale the number
metal and container instances. Deploying virtual machines of threads in the detection method to adjust the crash time
on the Raspberry Pi is impracticable and were therefore not of an instance as well as the injection rate. A simple CPU-
included in our evaluation. To run containers on the Raspberry bound program like the multiplication benchmark turns out
Pi we deployed a small Kubernetes cluster. to be ideal for injecting faults in an undervolted setup. The
Figure 6 shows histograms with crashes on bare-metal and advantage of such a simple CPU-bound program is that it is
container instances deployed on the 3B and 3B+. We show unlikely to inject faults during its own execution and can run
the run-time of our detection method and the temperature until a kernel panic while raising heat dissipation. In terms of
at which instances crashed. Our observations made with energy efficiency we observed that by undervolting the cloud
the temperature-based guardband analysis in subsection V-E provider can save on average 5 % and up to 37 % for specific
are confirmed by the temperature histograms. The run-time workloads on ARM processors.
strongly depends on the processor’s capability to heat up to a RA1: as shown by our extensive experimental evaluation, in
certain temperature and is therefore not an ideal parameter. We order to pull off a stealthy undervolting strategy, a malicious
observe clear differences between the thermal designs of the cloud provider must exchange any firmware configuration to
two models. For the 3B our detection method requires about undervolt the hardware and intercept any voltage requests
175 s / 30 s to reach 62 °C to crash bare-metal or container coming from users.
instances. On the 3B+ we require about 145 s / 250 s to reach
8
RA2: a cloud user can uncover such an undervolting strategy our detection method more deterministic by injecting faults in
by running a simple CPU-bound benchmark until enough processes more selectively.
processes have failed to render the cloud instance unavail-
able. The drawback of this detection method is that it is non- ACKNOWLEDGMENTS & D ISCLAIMER
selective and cloud instances can fail either soon or late.
The views and opinions of the authors do not neces-
sarily reflect those of the U.S. government or Lawrence
VII. R ELATED W ORK Livermore National Security, LLC neither of whom nor
Undervolting the supply voltage for energy savings has been any of their employees make any endorsements, express
explored on CPUs for ARM [46, 47], x86 processors [16, 48] or implied warranties or representations or assume any le-
the Itanium micro-architecture [49], and for POWER-7 proces- gal liability or responsibility for the accuracy, complete-
sors [36]. This experimental undervolting approach has been ness, or usefulness of the information contained herein.
extended to GPUs [50] and FPGAs [40] as well. On the CPU This work was partially prepared by LLNL under Contract
side, frameworks to automate and optimize the process of DE-AC52-07NA27344 (LLNL-CONF-817551) and by
undervolting have been developed [14, 46]. Recently, AMD the European Union’s Horizon 2020 research and innovation
has announced an undervolting product/framework for their programme under the LEGaTO Project (legato-project.eu),
most recent Ryzen 5000 CPUs [51]. In [52] the authors grant agreement No 780681.
discuss the trade-off between the reduced energy cost and the
SLA violation penalties introduced by higher node failures R EFERENCES
of undervolted X86 and ARM nodes. In CLKSCREW [20], [1] Ampere eMAG 8180 64-bit Arm Processor, Amp 2018-0007 ed.,
the undervolting capabilities of modern ARM processors is Ampere Computing, 4655 Great America Parkway, Suite 601,
exploited to compromise system security, by targetting un- Santa Clara, CA 95054, 2018.
dervolting faults to specific hardware components to extract [2] “Ampere Altra: The World’s First Cloud Native Processor,”
https://amperecomputing.com/altra/, Nov 2020, last accessed on
cryptographic keys. 2021-04-23.
[3] “Huawei Unveils Industry’s Highest-Performance ARM-
VIII. C ONCLUSION AND O PEN C HALLENGES
based CPU,” https://www.huawei.com/en/news/2019/1/
A cloud provider can obfuscate the undervolting of pro- huawei-unveils-highest-performance-arm-based-cpu, Jan
cessors and even run workloads up to 37 % more energy- 2019, last accessed on 2021-04-23.
efficiently. However, by undervolting its infrastructure, the [4] “NVIDIA Grace CPU,” https://www.nvidia.com/en-us/
data-center/grace-cpu/, Apr 2021, last accessed on 2021-04-23.
cloud provider incurs a major risk. Not only does the cloud [5] “AWS Graviton Processor,” https://aws.amazon.com/ec2/
provider reduce the margin of error but also the system’s graviton/, last accessed on 2021-04-23.
stability is at stake. Cloud users can with high probability [6] J. Barr, “Coming Soon - Graviton2-Powered General
detect such situations and exploit them using a simple CPU- Purpose, Compute-Optimized, & Memory-Optimized
bound benchmark. To some extent, the cloud provider can EC2 Instances,” https://aws.amazon.com/de/blogs/aws/
coming-soon-graviton2-powered-general-purpose-compute-optimized-memo
mitigate stability issues with appropriate cooling systems. Dec 2019, last accessed on 2021-04-23.
However, it is questionable if the gains of undervolting the [7] ThunderX Family of Workload Optimized Processors, Cavium,
infrastructure outweigh the costs of such cooling systems. 2315 N. First Street, San Jose, CA 95131, 2016.
Cloud users’ options to detect an undervolted ARM in- [8] T. Yoshida, “Fujitsu High Performance CPU for the Post-
stance remain limited and, as shown in this paper, essentially K Computer,” https://old.hotchips.org/hc30/2conf/2.13_Fujitsu_
HC30.Fujitsu.Yoshida.rev1.2.pdf, Aug 2018, last accessed on
depend on the probability to inject faults non-selectively in 2021-04-23.
processes. As our temperature-based guardband analysis and [9] “Raspbeery Pi Products,” https://www.raspberrypi.org/
failure evaluation have shown, the higher the processor’s tem- products/, last accessed on 2021-04-23.
perature, the more likely faults can be injected into processes. [10] Y. Léger, “Public Preview,” https://blog.scaleway.com/
Despite such a powerful cloud provider attacker model, cloud online-labs-public-preview, Oct. 2014, last accessed on
2021-04-23.
users have an exploitable weak link. Their only option for [11] “Neoverse N1,” https://developer.arm.com/ip-products/
presuming a potentially undervolted instance is by increasing processors/neoverse/neoverse-n1, last accessed on 2021-04-23.
the processor’s heat dissipation. Heat dissipation is increased [12] T. Burd and R. Brodersen, “Energy efficient cmos
by tuning the CPU frequency and load to the processor’s microprocessor design,” in 2014 47th Hawaii International
limit. Under these thermal conditions and an undervolted setup Conference on System Sciences, vol. 1. Los Alamitos, CA,
USA: IEEE Computer Society, jan 1995, p. 288. [Online].
the fault injection probability in processes is rising. Ideally Available: https://doi.ieeecomputersociety.org/10.1109/HICSS.
cloud instances will become unavailable and violate the SLA 1995.375385
as a result of continuously failing processes. Our detection [13] V. M. van Santen, H. Amrouch, N. Parihar, S. Mahapatra, and
method depends strongly on hardware and how systems such J. Henkel, “Aging-aware voltage scaling,” in 2016 Design, Au-
as firmware and AVS react to excessive heat dissipation. As tomation Test in Europe Conference Exhibition (DATE), 2016,
pp. 576–581.
future plans, we intend to expand this study to a more diverse [14] K. Parasyris, P. Koutsovasilis, V. Vassiliadis, C. D. Antonopou-
set of ARM-based hardware targets, focusing in particular on los, N. Bellas, and S. Lalis, “A framework for evaluating
current and future cloud offerings. We would also like to make software on reduced margins hardware,” in 2018 48th Annual
9
IEEE/IFIP International Conference on Dependable Systems Technology for Intel,” https://www.intel.com/content/www/us/
and Networks (DSN). IEEE, 2018, pp. 330–337. en/support/articles/000007073/processors.html, Jun. 2020, last
[15] G. Papadimitriou, A. Chatzidimitriou, D. Gizopoulos, V. J. accessed on 2021-04-23.
Reddi, J. Leng, B. Salami, O. S. Unsal, and A. C. Kestelman, [31] Cool’n’Quiet Technology Installation Guide for AMD Athlon
“Exceeding conservative limits: A consolidated analysis on 64 Processor Based Systems, 0th ed., Advanced Micro Devices
modern hardware margins,” IEEE Transactions on Device and Inc., Jun. 2004.
Materials Reliability, vol. 20, no. 2, pp. 341–350, 2020. [32] AMD PowerNow! Technology, A ed., Advanced Micro Devices
[16] P. Koutsovasilis, K. Parasyris, C. D. Antonopoulos, N. Bellas, Inc., Nov. 2000.
and S. Lalis, “Dynamic undervolting to improve energy effi- [33] L. S. Nielsen, C. Niessen, J. Sparso, and K. Van Berkel, “Low-
ciency on multicore x86 cpus,” IEEE Transactions on Parallel power operation using self-timed circuits and adaptive scaling
and Distributed Systems, vol. 31, no. 12, pp. 2851–2864, 2020. of the supply voltage,” IEEE Transactions on Very Large Scale
[17] K. Murdock, D. Oswald, F. D. Garcia, J. Van Bulck, D. Gruss, Integration (VLSI) Systems, vol. 2, no. 4, pp. 391–397, 1994.
and F. Piessens, “Plundervolt: Software-based Fault Injection [34] J. L. Henning, “SPEC CPU2006 benchmark descriptions,” ACM
Attacks against Intel SGX,” in Proceedings of the 41st IEEE SIGARCH Computer Architecture News, vol. 34, no. 4, pp. 1–
Symposium on Security and Privacy (S&P’20), 2020, 41st IEEE 17, 2006.
Symposium on Security and Privacy (S&P’20). [35] C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The PARSEC
[18] Z. Kenjar, T. Frassetto, D. Gens, M. Franz, and A.-R. Sadeghi, Benchmark Suite: Characterization and Architectural Implica-
“V0ltpwn: Attacking x86 processor integrity from software,” in tions,” in Proceedings of the 17th International Conference
29th {USENIX} Security Symposium ({USENIX} Security 20), on Parallel Architectures and Compilation Techniques, October
2020, pp. 1445–1461. 2008.
[19] P. Patel, A. H. Ranabahu, and A. P. Sheth, “Service level [36] Y. Zu, C. R. Lefurgy, J. Leng, M. Halpern, M. S. Floyd,
agreement in cloud computing,” 2009. and V. J. Reddi, “Adaptive guardband scheduling to improve
[20] A. Tang, S. Sethumadhavan, and S. Stolfo, “{CLKSCREW}: system-level efficiency of the POWER7+,” in 2015 48th An-
exposing the perils of security-oblivious energy management,” nual IEEE/ACM International Symposium on Microarchitecture
in 26th {USENIX} Security Symposium ({USENIX} Security (MICRO). IEEE, 2015, pp. 308–321.
17), 2017, pp. 1057–1074. [37] L. Tan, N. DeBardeleben, Q. Guan, S. Blanchard, and
[21] Z. Chen, G. Vasilakis, K. Murdock, E. Dean, D. Oswald, M. Lang, “Using virtualization to quantify power conservation
and F. D. Garcia, “VoltPillager: Hardware-based fault injection via near-threshold voltage reduction for inherently resilient
attacks against intel SGX Enclaves using the SVID voltage applications,” Parallel Computing, vol. 73, pp. 3–15,
scaling interface.” USENIX Association, Aug. 2021. [Online]. 2018, parallel Programming for Resilience and Energy
Available: https://www.usenix.org/conference/usenixsecurity21/ Efficiency. [Online]. Available: https://www.sciencedirect.com/
presentation/chen-zitai science/article/pii/S0167819117300996
[22] L. Upton, “Raspberry Pi colocation,” https://www.raspberrypi. [38] Y. Kim, L. K. John, S. Pant, S. Manne, M. Schulte, W. L.
org/blog/raspberry-pi-colocation, Apr 2013, last accessed on Bircher, and M. S. S. Govindan, “Audit: Stress Testing the
2021-04-23. Automatic Way,” in 2012 45th Annual IEEE/ACM International
[23] E. Blem, J. Menon, and K. Sankaralingam, “Power struggles: Symposium on Microarchitecture, 2012, pp. 212–223.
Revisiting the RISC vs. CISC debate on contemporary ARM [39] Z. Hadjilambrou, S. Das, M. A. Antoniades, and Y. Sazeides,
and x86 architectures,” in 2013 IEEE 19th International Sym- “Sensing CPU Voltage Noise Through Electromagnetic Emana-
posium on High Performance Computer Architecture (HPCA), tions,” IEEE Computer Architecture Letters, vol. 17, no. 1, pp.
2013, pp. 1–12. 68–71, 2018.
[24] H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, [40] B. Salami, E. B. Onural, I. E. Yuksel, F. Koc, O. Ergin, A. C.
and D. Burger, “Dark silicon and the end of multicore scaling,” Kestelman, O. S. Unsal, H. Sarbazi-Azad, and O. Mutlu, “An
in 2011 38th Annual International Symposium on Computer Experimental Study of Reduced-Voltage Operation in Mod-
Architecture (ISCA), 2011, pp. 365–376. ern FPGAs for Neural Network Acceleration,” in 2020 50th
[25] E. Le Sueur and G. Heiser, “Dynamic Voltage and Frequency IEEE/IFIP International Conference on Dependable Systems
Scaling: The Laws of Diminishing Returns,” in Proceedings of and Networks (DSN), 2020.
the 2010 International Conference on Power Aware Computing [41] K. Beer, “How encryption works in AWS,” https://image.
and Systems, ser. HotPower’10. USA: USENIX Association, slidesharecdn.com/repeat-how-encryption-works-, Jun. 2019,
2010, p. 1–8. last accessed on 2021-04-23.
[26] M. Eleršič, “linux-intel-undervolt,” https://github.com/mihic/ [42] “Open Portable Trusted Execution Environment,” https://www.
linux-intel-undervolt, Aug 2017, last accessed on 2021-04-23. op-tee.org, last accessed on 2021-04-23.
[27] D. Brodowski, “Linux CPUFreq - CPU frequency and voltage [43] “AWS Nitro System,” https://aws.amazon.com/ec2/nitro/, last
scaling code in the Linux kernel,” https://www.kernel.org/doc/ accessed on 2021-04-23.
html/latest/cpu-freq/index.html, last accessed on 2021-04-23. [44] PowerSpy2, 1st ed., Alciom, 4 Mar. 2013.
[28] M. Weiser, B. Welch, A. Demers, and S. Scott, “Scheduling [45] Arm Architecture Reference Manual: Armv8, for Armv8-A archi-
for Reduced CPU Energy,” in First Symposium on tecture profile, Arm ddi 0487f.b (id040120) ed., Arm Limited,
Operating Systems Design and Implementation (OSDI Mar. 2020.
94). Monterey, CA: USENIX Association, Nov. 1994. [46] G. Papadimitriou, M. Kaliorakis, A. Chatzidimitriou, D. Gi-
[Online]. Available: https://www.usenix.org/conference/osdi-94/ zopoulos, P. Lawthers, and S. Das, “Harnessing voltage margins
scheduling-reduced-cpu-energy for energy efficiency in multicore cpus,” in Proceedings of the
[29] J.-T. Wamhoff, S. Diestelhorst, C. Fetzer, P. Marlier, P. Felber, 50th Annual IEEE/ACM International Symposium on Microar-
and D. Dice, “The TURBO Diaries: Application-Controlled chitecture, 2017, pp. 503–516.
Frequency Scaling Explained,” in Proceedings of the 2014 [47] G. Papadimitriou, A. Chatzidimitriou, M. Kaliorakis, Y. Vas-
USENIX Conference on USENIX Annual Technical Conference, takis, and D. Gizopoulos, “Micro-viruses for fast system-level
ser. USENIX ATC’14. USA: USENIX Association, 2014, p. voltage margins characterization in multicore cpus,” in 2018
193–204. IEEE International Symposium on Performance Analysis of
[30] “Frequently Asked Questions about Enhanced Intel SpeedStep Systems and Software (ISPASS). IEEE, 2018, pp. 54–63.
10
[48] G. Papadimitriou, M. Kaliorakis, A. Chatzidimitriou, C. Mag-
dalinos, and D. Gizopoulos, “Voltage margins identification on
commercial x86-64 multicore microprocessors,” in 2017 IEEE
23rd International Symposium on On-Line Testing and Robust
System Design (IOLTS). IEEE, 2017, pp. 51–56.
[49] A. Bacha and R. Teodorescu, “Dynamic reduction of voltage
margins by leveraging on-chip ecc in itanium ii processors,”
in Proceedings of the 40th Annual International Symposium on
Computer Architecture, 2013, pp. 297–307.
[50] J. Leng, A. Buyuktosunoglu, R. Bertran, P. Bose, and
V. J. Reddi, “Safe limits on voltage reduction efficiency in
gpus: a direct measurement approach,” in 2015 48th Annual
IEEE/ACM International Symposium on Microarchitecture (MI-
CRO). IEEE, 2015, pp. 294–307.
[51] I. Cutress, “AMD Precision Boost Overdrive 2:
Adaptive Undervolting For Ryzen 5000 Coming
Soon,” https://www.anandtech.com/show/16267/
amd-precision-boost-overdrive-2-adaptive-undervolting-for-ryzen-5000-coming-soon,
Nov 2020, last accessed on 2021-04-23.
[52] C. Kalogirou, P. Koutsovasilis, C. D. Antonopoulos, N. Bellas,
S. Lalis, S. Venugopal, and C. Pinto, “Exploiting cpu voltage
margins to increase the profit of cloud infrastructure providers,”
in 2019 19th IEEE/ACM International Symposium on Cluster,
Cloud and Grid Computing (CCGRID), 2019, pp. 302–311.
11