Csail: Identification and Authentication of Integrated Circuits
Csail: Identification and Authentication of Integrated Circuits
Csail: Identification and Authentication of Integrated Circuits
Identification and
Authentication of Integrated
Circuits
Blaise Gassend1 , Dwaine Clarke1 , Daihyun Lim1
Marten van Dijk2 , Srinivas Devadas1
1
SUMMARY
This paper describes a technique to reliably and securely identify individual integrated
circuits (ICs) based on the precise measurement of circuit delays and a simple challengeresponse protocol. This technique could be used to produce key-cards that are more
difficult to clone than ones involving digital keys on the IC. We consider potential venues
of attack against our system, and present candidate implementations. Experiments on
Field Programmable Gate Arrays show that the technique is viable. Finally, we analyze
the difficulty of breaking the system in an idealized additive delay model.
key words:
1.
Introduction
We describe a technique to identify and authenticate arbitrary integrated circuits (ICs) based
on a prior delay characterization of the IC. While ICs can be reliably mass-manufactured to
have identical digital logic functionality, the premise of our approach is that each IC is unique
in its delay characteristics due to inherent variations in manufacturing across different dies,
wafers, and processes. While digital logic functionality relies on timing constraints being met,
different ICs with the exact same digital functionality will have unique behaviors when these
constraints are not met, because their delay characteristics are different.
Researchers have proposed the addition of specific circuits that produce unique responses
due to manufacturing variations in ICs such that these ICs can be identified (e.g., [12]).
Correspondence to: Massachusetts Institute of Technology, Laboratory for Computer Science, Cambridge, MA
02139, USA
However, with these techniques, the focus is simply on assigning a unique identifier to each
chip, without having security in mind. In order to authenticate an IC, a key has to be placed
within the IC, access to the key has to be restricted to cryptographic primitives, and the IC
has to be made tamper resistant, so attempts by the adversary to determine the key destroys
the key. In essence, digital information has to be hidden in the IC. This information can then
be used to simply identify the IC, or it can be used to enable a wide range of applications that
rely on keyed cryptographic primitives.
Making an IC tamper-resistant to all forms of attacks is a challenging problem and is
receiving some attention [3]. Numerous attacks are described in the literature. These attacks
may be invasive, e.g., removal of the package and layers of the IC, or non-invasive, e.g.,
differential power analysis [10], that attempts to determine the key by stimulating the IC and
observing the power and ground rails. IBMs PCI Cryptographic Coprocessor encapsulates a
486-class processing subsystem within a tamper-sensing and tamper-responding environment
where one can run security-sensitive processes [16]. However, providing high-grade tamper
resistance, which makes it impossible for an attacker to access or modify the secrets held
inside a device, is expensive and difficult [1, 2].
We propose that authentication be based on hidden delay or timing information
corresponding to a circuit rather than digital information. We will argue that the level of
tamper resistance required to hide delay information is significantly less than for digital
information. Invasive methods to determine device and wire delays will invariably change the
delay of the devices or wires upon removal of the package or metal layers. Further, non-invasive
attacks that are sometimes successful in discovering secret digital keys such as differential
power analysis (DPA) [10] and electromagnetic analysis (EMA) [14] fail to provide precise
enough delay information to break delay-based authentication. Another important difference
between hiding digital information versus timing information is that in the former case the
manufacturer can produce many ICs with the same hidden digital key, but it is very hard,
if not impossible, for a manufacturer to produce two ICs that are identical in terms of their
delay characteristics.
To elaborate, our thesis is that there is enough manufacturing process variations across
ICs with identical masks to uniquely characterize each IC, and this characterization can be
performed with a large signal-to-noise ratio (SNR). The characterization of an IC involves
the generation of a set of challenge-response pairs. To authenticate ICs we require the set of
challenge-response pairs to be characteristic of each IC. For reliable authentication, we require
that environmental variations and measurement errors do not produce so much noise that they
hide inter-IC variations. We will show in this paper, using experiments and analysis, that we
can perform reliable authentication.
The rest of this paper will be structured as follows: We describe the notion of a physical
random function, which is what we are trying to implement in Section 2. An overview of
our approach to identify and authenticate ICs based on delays is given in Section 3. Then,
in Section 4 we describe some applications, in particular a secure key card application. We
consider plausible attacks on PUFs in Section 5. Section 6 presents a candidate PUF circuit
based on delay measurement, and discusses its identification characteristics. Section 7 presents
a second candidate circuit that compares two delays, and goes over relevant experimental
measurements. Finally, Section 8 shows how these candidate circuits can be attacked in the
additive delay model.
2.
Definitions
Delay-Based Authentication
3.1.
When a circuit is replicated across dies or across wafers, manufacturing variations cause
appreciable differences in circuit delays. Across a die, device delays vary due to mask variations
PUF
actually stands for Physical Unclonable Function. It has the advantage of being easier to pronounce,
and it avoids confusion with Pseudo-Random Functions.
this is sometimes called the system component of delay variation. There are also random
variations in dies across a wafer, and from wafer to wafer due to, for instance, process
temperature and pressure variations, during the various manufacturing steps. The magnitude of
delay variation due to this random component can be 5% or more for metal wires, and is higher
for devices (see chapter 12 of [5]). Delay variations of the same wire or device in different dies
have been modeled using Gaussian distributions and other probabilistic distributions (e.g., [4]).
We briefly note here that in our experiments, the standard deviation of path delays in our
example circuits across different FPGAs was in the range of 400 ppm.
3.2.
Environmental Effects
On-chip measurement of delays can be carried out with very high accuracy, and therefore the
signal-to-noise ratio when delays of corresponding wires across two or more ICs are compared is
quite high, provided environmental variation is low. However, temperature and power supply
voltage have a significant effect on the absolute values of circuit delays [18]. To keep the
signal-to-noise ratio high under significant environmental variations, we require compensated
delay measurement (cf. Section 6.2.1). Using compensated delay measurement, we can keep
the variation due to standard deviation sufficiently below inter-chip variation to allow reliable
identification despite a wide range of environmental variations.
3.3.
4.
4.1.
Applications
Secure Keycard
The simplest application for PUFs is to make tamper-resistant, unforgeable key cards. This
application was first described in [15]. We will argue in Section 5 that silicon PUFs are difficult
to forge and, as a result, these key cards are difficult to clone. The cards can also be combined
with biometrics to help identify users.
These cards can be used for authenticated identification, in which someone or something
with physical access to the card can use it to gain access to a protected resource. The general
model is that of a principal with the key card presenting it to a terminal at a locked door.
The terminal can connect via a private, authentic channel to a remote, trusted server. The
server has already established a private list of Challenge-Response Pairs (CRPs) with the card.
When the principal presents the card to the terminal, the terminal contacts the server using
the secure channel, and the server replies with the challenge of a randomly chosen CRP in
its list. The terminal forwards the challenge to the card, which determines the response. The
response is sent to the terminal and forwarded to the server via the secure channel. The server
checks that the response matches what it expected, and, if it does, sends an acknowledgment
to the terminal. The terminal then unlocks the door, allowing the user to access the protected
resource. The server should only use each challenge once, to prevent replay attacks; thus, the
user is required to securely renew the list of CRPs on the server periodically.
4.2.
Controlled PUFs
As we have implemented them in this paper, card-PUFs can be used for authenticated
identification, as described above. However, unlike the PUFs from [15], silicon PUFs can be
accompanied on the same chip with control logic that restricts access to the PUF. In this case
we have a Controlled PUF. By using the methods described in [7], a controlled PUF can be
used to establish a shared secret between a remote party and a trusted chip. Because of the
way the secret is embedded in the PUF, it is much harder for an adversary to impersonate
the trusted chip than it would be if the chip had a secret stored on itself in digital form. This
improved resistance to physical attacks is the principal advantage of using a PUF.
The applications of Controlled PUFs are all the applications that can benefit from having
a shared secret between a chip and a remote party. Digital rights management, set-top boxes
and distributed computation are examples of such applications. More details can be found
in [7].
5.
Attacks
There are many possible attacks on silicon PUFs we describe some of them in this section.
5.1.
Duplication
Timing-Accurate Model
Alternately, the adversary can attempt to create a timing-accurate model of the original PUF
and simulate the model to respond to challenges, in effect creating a virtual counterfeit. The
accuracy of this model has to be comparable to the accuracy of reliable (on-chip) circuit delay
measurement in order to produce a successful virtual counterfeit. Here, the adversary has three
options, direct measurement, exhaustive enumeration of challenges, and model-building using
observed responses based on a subset, i.e., a polynomial number of challenges.
5.2.1.
Direct Measurement
The adversary can attempt to directly measure device delays of the circuit by probing
or monitoring internal devices. He can then use these measured delays in a more or less
sophisticated timing model.
In order to do this at the level of accuracy required to break authentication, he will have
to remove the package and insert probes. Indeed, non-invasive attacks such as DPA [10] and
EMA [14] extract information about collections of devices, not individual devices. Probing with
sufficient precision is likely to be very difficult because the adversary runs the risk of changing
the circuit delays while probing. Interactions between the probe and the circuit will directly
influence the circuit. Moreover, in order to insert his probes, the adversary will potentially have
to damage overlaid wires. Because of the high capacitive coupling between neighboring wires
(see [6] for the importance of capacitive coupling between wires), damage to these overlaid
wires could significantly change the delay that is to be measured.
How best to lay out the PUF circuit to make it highly sensitive to invasive attacks is a
direction for further research.
5.2.2.
Exhaustive Model
Clearly, a model can be built by exhaustively enumerating all possible challenges, but this is
intractable, since there are an exponential number of possible challenges.
5.2.3.
The adversary can use a publicly available mask description of the IC/PUF and apply
challenges and monitor responses and attempt to build a timing-accurate model.
We first note that creating accurate timing models given mask information is an intensive
area of research. Even the most detailed circuit models have a resolution that is significantly
coarser than the resolution of reliable delay measurement. If an adversary is able to find
a general method to determine polynomial-sized timing models that are accurate to within
measurement errors, this would represent a breakthrough. However, the adversary has a slightly
different problem he needs to build a highly accurate model of a particular IC, to which he
has access, and to which he can apply challenges and monitor responses.
The transient response of an IC is a non-linear and non-monotonic function of the delays
of wires and devices in the IC. The adversary has to guess a general enough parameterizable
model (e.g., delay of a device is dependent on load capacitance and transitions of neighboring
devices), and obtain enough responses to well-chosen challenges such that he obtains a system
of equations that can be inverted to obtain the parameters of his model.
In Section 8 we will return to modeling attacks when considering the difficulty of modeling
our proposed authentication circuits.
bi = 0
d4i+2
d4i+2
d4i+3
d4i+3
bi = 1
Figure 1. The switch component. For bi = 0 the paths go straight through, for bi = 1 the paths are
crossed.
6.
Finding a delay circuit that produces a satisfactory PUF that is provably hard to break is
difficult because of the numerous different types of attacks that are possible. It is unclear how
classical hard problems such as factorization or discrete logarithm could be embedded in the
analog behavior of a physical system.
This section shows a candidate circuit that we have performed experiments on. We
conjectured in [9] that this circuit would be difficult to break in the additive delay model
(see Section 8.1), in which we assume that the delay of a complex circuit can be exactly
broken up into a sum of delays of components that make it up. More recent work on a similar
circuit that we present in Section 7 suggests that our conjecture was incorrect. Therefore,
to create a PUF that is secure in the additive delay model, modifications will be needed to
make this circuit more difficult to analyze. For practical purposes, we believe that the sheer
complexity of determining circuit delays precisely enough might turn out to be sufficient to
prevent modeling attacks on PUFs. Papers such as [11] show just how difficult precise delay
simulation can be.
The circuit for which we will measure delays that is implemented in our key card is depicted
in Figure 2. A challenge of n = 128 bits is transformed by a pseudorandom function into a
bit pattern b = (b1 , . . . , bn ). The bits bi control switches. If bi = 0, the switch is uncrossed
(Figure 1(a)); if bi = 1, the switch is crossed (Figure 1(b)). To get a response from this
circuit, we present a rising edge on its input. That edge is split into two competing edges
that independently propagate through the switches until they reach the AND gate. When the
slowest of the two edges reaches the AND gate, an edge appears on the output of the circuit.
The response of the circuit is the time it took for the edge on the input to produce an edge
on the output.
In the additive delay model, this response of the circuit for a given challenge can be expressed
as the max of two sums of elementary path delays. Thus we clearly see the need for the AND
gate. Without it, responses would be linear combinations of elementary delays, and obtaining
elementary delays from challenge-response pairs would reduce to solving a linear system of
equations. With the AND gate, each challenge-response pair produces two candidate equations,
switch
...
b1
b2
challenge
b3
AND
b127 b128
Pseudorandom Function
only one of which actually holds. Ideally for k pairs, the adversary would have 2 k different
systems of equations to choose from, though we expect an intelligent adversary to be able to
eliminate many of these possibilities.
If the adversary is able to directly chose the bi that gets presented to the device then there
are easy ways for him reduce the number of systems of equations he must deal with. For
example, by choosing two sets of bi that only differ on two neighboring terms, there is a high
likelihood that the same path was the slowest in both cases, in this way only two of the four
possible systems of equations need to be considered. With a little more work, the individual
delays in the circuit can once again be found by solving a linear system of equations. To
prevent this type of attack, we do not allow the adversary to directly chose the b i values. An
error correcting code would be sufficient to prevent the attack we described. We chose to use
a pseudorandom function as it allows us to prevent any kind of chosen challenge attack.
One more precaution is necessary. If one elementary delay is much longer than all the others,
then it will always be in the slowest path through the circuit, which once again reduces the
adversarys work to solving a linear system of equations. This problem is particularly serious
if the same elementary delay is particularly long on all instances of the device. To prevent
this problem, the delay circuit we request that the circuit be designed to be as symmetrical
as possible. That way, knowledge of the circuit layout cannot help the adversary guess which
path through the circuit is the slowest. Moreover, with these precautions, the likelihood that
one elementary delay will be much longer than the others through random process variations
is kept to a minimum.
10
6.1.
Circuit details
In order to prove that identification is possible using delay variations between Integrated
Circuits, we have implemented a PUF on Xilinx Spartan 2 FPGAs. In these tests, identical
circuits were placed on different FPGAs, and the resulting PUFs were compared. Our goal in
this section is to show that the identification is possible given the measurement noise levels
and manufacturing variations that we have observed.
Because we do not have full control over the circuits that are implemented in an FPGA, a
few compromises have to be made relative to the theoretical design.
First, the unpredictability of the MAX-circuit relies on having a circuit with a high level
of symmetry between paths. The general purpose routing infrastructure of an FPGA makes
it difficult to produce precisely matched paths. Therefore the FPGA circuits that we worked
with do not have the degree of symmetry that would be required for a PUF to be secure.
However, since the asymmetry is the same across all components, it does not make any change
to the difficulty in identifying components, which is what we will be discussing in this section.
The second limitation of FPGAs, is that the lack of analog components makes it impractical
to directly measure the delay of a path through the circuit with the precision that we require.
To get around this problem, we use self-oscillating loops containing the path for which we
want to measure the delay. Using digital circuitry, we can precisely measure the frequency of
the self oscillating loops over a few tens of thousands of periods.
Note, however, that the use of self oscillating loops to measure delays is not ideal, and should
not be used for a production design. First it drastically increases the time (and power) that is
required to evaluate the PUF. Worse, it makes the frequency that is being measured, which is
the response of the PUF to a challenge, vulnerable to differential power analysis. This is not
very problematic for a key card application, but can be fatal in the case of Controlled PUFs
(see [7]).
Figure 3 shows how a self oscillating loop is built around the delay circuit. Since this selfoscillating loop has to be used both for rising and falling transitions, the and gate that combines
the two paths of the delay circuit of Figure 2 has been replaced by a more complicated circuit
that switches when the slowest transition, be it rising or falling, reaches it. The circuit is
essentially a flip-flop that changes state when both outputs from the delay circuit are at the
same level.
The dotted box indicates a delicate part of the circuit that cannot be implemented exactly
as shown without running the risk of producing glitching in the output. In the FPGA it
is implemented by a lookup table. In an implementation with simple logic, it should be
implemented in normal disjunctive form. The representation that was used here was simply
chosen for ease of understanding.
The
11
...
...
Figure 3. Measuring delays using self-oscillating loops
6.2.
So far, all our discussion has considered that path delays in a circuit are constant for a given
component. In reality, this is far from the case. Environmental perturbations can account for
variations that are large enough to mask out the small manufacturing variations that we are
trying to measure. Therefore, they must be taken into account.
6.2.1.
Parameters such as temperature or supply voltage can cause variations in delay that are orders
of magnitude greater than the manufacturing variations that we are trying to observe. For a
30 degree Celsius change in temperature, the delays vary on the order of 5%. This is to be
compared with inter-chip variations that are well below 1% on this size of circuit.
Fortunately, we have found that environmental variations operate roughly proportionally on
all the delays in our circuit, and therefore, they can be compensated for by always working
with delay ratios instead of absolute delays. Therefore, we place two different self-oscillating
loops on the FPGA. We run both self-oscillating loops to get two frequencies, and take a ratio
of the two frequencies as the PUFs response.
Once compensation has been applied, the variation with temperature is of the same order
of magnitude as the measurement error.
For very large temperature changes (at least greater than 30 degrees Celsius), we can no
longer expect to reliably recognize a PUF. The answer to this problem is to characterize the
PUF once when it is hot and once when it is cold (more steps are possible for large temperature
ranges). During use, one of these two cases will apply, so the PUF will be correctly recognized.
Up to now, we have assumed that temperature is uniform across the integrated circuit. If
that is not the case then temperature compensation is likely not to work well. With the circuit
presented here, the paths are heated in a uniform way by the transitions that are running
through them. With other circuits in which transitions only reach some parts of the circuit, we
probability density
12
0.96
0.97
0.98
0.99
1
1.01
compensated measurement
1.02
1.03
Figure 4. Distribution of responses to randomly selected challenges. Each response is the ratio of the
frequencies of two simultaneously-running loops. As can be seen, when the loop frequencies are too
close, the loops lock and the response is unity.
have observed non uniform heating which can cause unreliable measurement results. Therefore,
we recommend the use of circuits that get heated in a uniform way during use.
6.2.2.
Another kind of environmental interference that has to be considered is the interaction between
a self-oscillating loop, and other circuitry on the integrated circuit.
Experiments in which we measure the frequency of a loop oscillating alone, or at the same
time as other loops show that the interference is very small. This has been demonstrated in
[8] where the interference was provided by seven self-oscillating loops, and once again in our
latest experiments where the frequencies of the two loops that are being measured can be
measured simultaneously or successively. In each case, the interference caused by the other
self-oscillating loops is of the same order of magnitude as measurement error.
There is however one case in which interference is non negligible. It is the case when the
interference is at almost the same frequency as the self-oscillating loop. In that case, the loops
frequency tends to lock on the perturbating frequency. Because of this, it is recommended
not to simultaneously measure the two frequencies that will get combined into a compensated
measurement. Figure 4 shows the result of locking on compensated measurements: values near
unity have been forced towards unity by the locking phenomenon.
6.2.3.
Aging
Through prolonged use, the delays of an integrated circuit are known to shift. We have not
yet studied the effect that aging might have on a PUF. In particular, if the changes due to
13
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
Abe vs. Abe, same conditions
Abe vs. Abe, different test boards
Abe vs. Abe 10 deg. Celsius hotter
Abe vs. Abe 20 deg. Celsius hotter
Abe vs. Abe 30 deg. Celsius hotter
Abe vs. Walt
Abe vs. Hal
0.2
0.1
0
1e-06
1e-05
0.0001
0.001
0.01
Figure 5. Comparing the FPGA called Abe at room temperature with itself in various conditions, or
with other FPGAs. The vertical axis indicates the probability that for a given challenge, the difference
in response will be lower than the difference in response that is indicated on the horizontal axis. These
plots illustrate the typical behavior we encountered in our experiments with many FPGAs.
aging are big enough, we might not be able to recognize a PUF after it has undergone much
use. Studying these aging effects is an important aspect that must be covered by future work.
6.3.
Identification Abilities
To test our ability to distinguish between FPGAs, we generated a number of profiles for many
different FPGAs in different conditions. A profile is made up of 128 challenge-response pairs.
All the profiles were established using the same challenges.
Two profiles can be compared in the following way: For each challenge look at the difference
between the responses. You can then look at the distribution of these differences. If most of
them are near zero, then the profiles are close. If they are far from zero then the profiles are
distant. During our experiments, the distribution of differences was typically Gaussian, which
allows us to characterize the difference between two profiles by a standard deviation.
Figure 5 shows the differences between the profile for an FPGA called Abe on Blaises test
board at room temperature, and a number of other profiles ( is the standard deviation):
Another profile of Abe on Blaises test board at room temperature ( 1 10 5 ). (This
reflects power supply variations with time at a reader.)
14
A profile of Abe on Taras test board at room temperature ( 2.5 105 ). (This reflects
power supply variations across card readers.)
Profiles of Abe on Blaises test board at 10, 20 and 30 degrees Celsius above room
temperature ( 5 105 to 1.5 104 ).
Profiles of FPGAs Hal and Walt on Blaises test board at room temperature ( 410 4 ).
The above standard deviations were typical across different FPGAs and comparisons of
different pairs of FPGAs.
Clearly, it is possible to tell FPGAs apart. Though our ability to tell them apart depends
on how much environmental variation we need to be robust to. Even with 30 degree Celsius
variations, each challenge is capable of providing 0.7 bits of information about the identity of
the FPGA. This goes up to 1.5 bits if only 10 degree Celsius variations are allowed.
If we want to distinguish between 1 billion different components we need a sufficient number
of bits to identify 1018 260 components (this is because of the birthday phenomenon). Getting
those 60 bits of information requires from 40 to 90 challenges depending on the temperature
variations that we are willing to tolerate.
The numbers that are given here are very dependent on the PUF circuit that is considered.
In the circuit that we studied in [8] we had a signal to noise ratio that was much better than we
observed in the current circuit. We believe that by paying more attention to how our circuit
is laid out, we will be able to build PUFs for which more bits can be extracted from each
challenge.
7.
Measuring delays using self-oscillating circuits as we did in Section 6 is easy and precise.
However, to get a precision in the hundreds of parts per million requires tens to hundreds of
thousands of clock cycles, which implies that thousands of edges will have to be propagated
through the delay circuit each time we want to measure a single delay. This means that
measurement is slow. To make delay measurement faster, we are trying to do direct delay
measurements, which only run a single edge through the delay circuit. This is not a trivial
task given that in our experiments, we have to measure the delays with precisions on the order
of tens of picoseconds.
The method that we have chosen to explore is to use an arbiter. The arbiter has two inputs,
which are both low initially. The arbiter waits for one of the inputs to go high, at which time
its output indicates which input went high first.
Figure 6 shows the structure of arbiter circuit and its operation. We are assuming as for
the MAX circuit that the challenge has been through a pseudorandom function before being
fed into the delay circuit. Two signals race through the delay paths that are defined by the
challenge input bits. At the end of circuit, the arbiter decides which signal arrived first and
outputs a bit.
If the difference in delay for the two racing paths isnt much larger than the inter-chip
variation in delay, then the value of the output bit is likely to be different from one chip to
another, which makes identification possible. For this reason, it is highly desirable for the
15
Figure 6. The overall scheme of arbiter circuit is similar with MAX circuit except for the arbiter
instead of AND gate. Arbiter detects faster signal between two input rising edges and produces 1 bit
information from it.
arbiter circuit to be designed to be as symmetrical as possible. This way, all the challenges
will have outputs that vary from chip to chip. Unfortunately, as we shall see, our FPGA
implementation was not very symmetrical so most of the bits have little identification value
for us.
7.1.
7.1.1.
Experimental Results
Inter-Chip Variation
To study the characteristics of our FPGA implementation of the arbiter circuit, we measured
response for 100,000 challenges across 23 FPGAs. For each challenge, we calculated the
probability of a response bit being 1. Figure 7 shows the distribution of those probabilities.
Assuming that delays have a Gaussian variation from chips to chip, and a Gaussian variation
from challenge to challenge, the measurements we made suggest that there is about 27 times
more variation due to challenges than due to inter-chip variation. Only 6% of challenges
actually have a response that changes from chip to chip. We are currently producing an ASIC
test chip with a delay circuit that was laid out by hand to maximize symmetry. This way most
of the challenges should produce useful identification information (or none at all if a skew is
introduced between the two paths).
16
0.5
0.45
Probability Density
0.4
P(Pr(response=1)=0) = 0.56
0.35
0.3
P(Pr(response=1)=1) = 0.38
0.25
0.2
0.15
0.1
0.05
0
0
0.2
0.4
0.6
0.8
Pr(response = 1)
Figure 7. How much the output of a challenge varies from chip to chip. The sample space is the set of
challenges. Triangles show experimental data, the line shows an fit with a theoretical model. For our
circuit the vast majority of challenges had the same output on all the chips we measured.
To study how different chips are from each other, we look at the number of challenges in our
100,000 challenge test set that change from one chip to another; this is called the Hamming
distance. In our experiments with 23 FPGAs, the average Hamming distance between chips
was 1049, roughly 1.05% of the challenges.
7.1.2.
Measurement Noise
17
Temp variation
1.4
1.2
Inter FPGA Variation (1.049%)
Noise %(N)
1
0.8
Best Model Limit (0.61%)
0.6
0.4
Measurement noise (0.098%)
0.2
0
0
10
15
20
25
30
35
40
Temperature difference(celsius)
Figure 8. Variation in normalized Hamming distance (noise) when a chip is compared with itself at
a different temperature. For comparison the inter-chip variation and our best attempt at a modelbuilding attack are also shown.
7.1.3.
Temperature variation
A nice aspect of the arbiter circuit is that since it is comparing the delays of two paths through
the circuit, it is automatically applying the temperature compensation methods that we had
to explicitly introduce for circuits that were based on delay measurements.
Figure 8 shows the amount of noise introduced by temperature variation. We made a profile
for 100,000 challenges at the reference temperature of 28 degrees Celsius and calculated its
distance from the response vector at increasingly higher temperatures. Even with a change
in temperature greater than 40 degrees Celsius, the noise is only about 0.3%, well below the
inter-chip variation of 1.06%
Overall, the arbiter circuit has qualitatively comparable performance to the delay
measurement circuit. If the proportion of challenges that are useful for identification can be
increased, it will be much faster to use than the delay measurement circuit, which makes it an
excellent candidate for PUF applications.
18
8. Modeling Attacks
As we have seen in Section 5.2, it must be difficult for on attacker to build an accurate timing
model of our circuit if we want to prevent him from impersonate it. To evaluate the difficulty
of this type of attack we tried to carry it out ourselves. The results we have to date suggest
that the basic MAX and arbiter circuits are not difficult enough to model, contrarily to what
we had conjectured in [9].
8.1.
Additive-Delay Model
We model the circuits in the additive model. In this model, we assume that the delay of a
path through the circuit is the sum of the delays of the elementary components that make
up the circuit. Knowing all the elementary component delays an attacker can calculate the
circuit delay for any challenge. The attackers task is to find the elementary delays. Assuming
that getting the elementary delays by opening the circuit and doing direct measurement is
impractical, we see that an attacker will have to infer those delays from the outputs of the
device on a number of challenges. Because of the pseudorandom function that has been placed
in the circuit, the challenges that the adversary has to work with are effectively random.
The circuits we have considered in sections 6 and 7 are made up of a sequence of identical
stages, each containing a switch block. The switch block can be modeled by using 4 delays
as shown in Figure 9(a). But this representation contains a lot of redundancy. First, as is
shown in Figure 9(b), the delay can be separated into a symmetric and an antisymmetric
part. The symmetric part gets added to both paths, and the antisymmetric part added to
one path and subtracted from the other. Finally, we note that the point where one stage
starts and the next one ends isnt clearly defined. By moving those points around, the number
of parameters to describe a stage can be reduced to 2 as illustrated in Figure 9(c). One to
describe the difference that the challenge bit makes to the symmetric part, and one to describe
the difference in delay between the two paths between the two paths. Repeating this operation
at each stage, and reordering the symmetric part leads to simplified circuit in Figure 9(d). All
together the n switched stages are modeled by using 2n + 2 parameters. For the arbiter half of
those parameters are unnecessary as only the difference between the two delays is considered
by the arbiter.
Algebraically, the delay of the top path can be expressed as
n
X
(1)ci vi + v +
i=0
n
X
(1)pi yi + y
i=0
where pi = ci cn and denotes the exclusive-or operation. The delay of the bottom
path is similar except that all the y terms have a change of sign. As we can see, the delay can
be seen as the sum of a constant and two dot products of delays by a vector that is a function
of the challenge. In the arbiter case there is only one dot product as the symmetric part is of
no use.
(a+b)/2
(a+b)/2
(ab)/2
a
c
(d
c
)/2
(c+d)/2
)/2
d
(c
19
(ba)/2
(c+d)/2
(a)
(b)
a+x+u
c
+u
+u
+x
u=(a+b+c+d)/4
v=(a+bcd)/4
x=(abc+d)/4
y=(ab+cd)/4
bx+u
(c)
v0
vn
...
v0
c0
y0
...
yn
y*
y0
...
yn
y*
v*
vn
c0
cn
cn
(d)
Figure 9. Simplifying the delay model of a switch block and resulting simplified model of a switch-based
delay circuit. The dots represent the points between which delays are measured.
20
8.2.
The perceptron algorithm [17] is the simplest available algorithm for training single layered
neural networks. In such a network, the output of a neuron is computed by taking the dot
product of an input vector (typically a vector of +1 and -1) with a vector of weights, and
comparing the result with a threshold. To train such a neuron on a set of input vectors, one
simply iterates through the input vectors. If an input produces the correct output, nothing is
changed. If an input produces an incorrect result, each one of the neurons weights in changed
in the direction that would tend to make it produce the correct result.
If the training set can be learned by the neural network (i.e., if the two types of outputs can
be separated by a hyper-plane in the input space), then the perceptron algorithm converges
to a solution that satisfies all the training cases. If not, a slightly improved version of the
algorithm can be used, which converges to the set of weights that maximizes the number of
training cases that the neuron correctly decides.
According to the analysis we did in Section 8.1 the arbiter circuit can be viewed as a neuron,
where the weights are a function of the elementary circuit delays, and the input vector is a
simple function of the challenge bits. Therefore, we can expect the arbiter circuit to be very
easy to build a model for under the additive delay model. What remains to be seen is whether
the additive delay model is precise enough to produce a model that can be used to impersonate
a physical device.
Because of the MAX operation, and because it has a real valued output, the MAX circuit
cannot be reduced exactly to the perceptron algorithm, however, we found that a slightly
modified algorithm gives excellent results. The method is as follows: first the model is initialized
with random elementary delays. Then for each example in the training set, the output of the
circuit is calculated. Based on how far the output is from the desired output, the elementary
delays on the path that was selected by the MAX function are adjusted to bring the output
closer to the desired value. In numerical experiments, we found that this method usually
perfectly learned the training set (in rare cases it gets stuck far from the correct solution, but
slightly different starting parameters then lead to perfect convergence). When noise was added
to the experiments, the algorithm converged to within the amount of noise that was added.
At this point, we must conclude that if the additive delay model applies sufficiently precisely
to our circuits, then a model building attack is possible and even easy. This does not necessarily
mean that the real circuits can be modeled though, as the additive delay model is not precise
enough to properly model the real device.
8.3.
Modeling Experiments
To try to get an idea of how much our circuit deviates from the additive delay model, we
attempted to train a model of the arbiter circuit using 90,000 challenge-response pairs measured
on our FPGA implementation. When evaluated on a different set of 100,000 challenge-response
pairs, the success rate of the model was only about 96.7%. This is greater than the inter-chip
variation, so one would be better off using a different FPGA than using our model when
attempting to impersonate a chip.
C=0 C=1
A
B
C
0
C=0 C=1
A = 0, B = 0
A = 1, B = 0
A = 0, B = 1
A = 1, B = 1
Multiplexer
O
A rising, B = 0, C = 0
Decode
Decode
Unused
C
0
21
Unused
A = 0, B = 0
A = 1, B = 0
A = 0, B = 1
A = 1, B = 1
Multiplexer
O
A rising, B = 1, C = 0
Figure 10. In a lookup table implementation of a multiplexer, the path that is taking by a rising edge
on input A to get to output O depends on the state of input B.
Our initial hypothesis for these disappointing (from the attackers point of view) results was
that we were getting interference between the two paths through the delay circuit. Indeed, the
switch component in the delay path is implemented by using a pair of multiplexer (MUX), that
each have the challenge bit and both of delay paths as inputs. In Xilinx FPGAs, combinational
logic is mainly implemented using 4 input look up tables (LUT). In particular, in the circuit
we implemented, the MUXes were being implemented as LUTs. These LUTs are laid out as
4 by 4 grids of SRAM cells as shown in Figure 10. As the figure shows, a rising edge that
goes through the MUX takes a different path through the grid depending on the value of
the spectator input that is not currently selected. To take this difference into account would
require a more complex model than the one we had considered.
To verify this hypothesis, we resynthesized our circuit using the MUXCY component that
is part of the FPGAs fast carry logic, hoping that this component would be implemented in
such a way that the path through the multiplexer does not depend on the spectator input.
This was a reasonable assumption as most multiplexer implementations have this property.
This time, the perceptron algorithm gave us a model that had a success rate of 99.4%. This is
still above measurement noise, even with 40 degree Celsius variations in temperature, but is
well below inter-chip variation (see Figure 8).
At this point, we cannot say that the arbiter circuit of Figure 6 is broken, but an extremely
simple modeling effort has been able to get uncomfortably close to measurement noise for
one implementation of the arbiter circuit. Similar experiments are currently underway for the
MAX circuit.
22
switch
arbiter
output
. . .
. . .
(ignored)
challenge b100
b101
b127
b128
Input
Output
. . .
Control
Input
Switch
Switch
. . .
Output
Control
bi
bi+1
(b) In a circuit
8.4.
Our experimental results suggest that the circuits we have proposed to date are not as strong
against modeling attacks as we would like. A number of directions can be taken to try to make
the circuit stronger, evaluating these options will be the object of future work.
Feed forward arbiter: An arbiter is placed at an intermediate point in the circuit, its output
drives one of switches later on in the circuit (Figure 11). Many such arbiters could be
added to the circuit for increased difficulty. The major problem with this approach is
that it adds discontinuities in the circuit, i.e., conditions in which a slight change in a
delay (due to noise, for example) can lead to a huge difference in the delay of a path.
Moreover, this circuit can easily be expressed in the additive delay model, and we fear
that a variant of the perceptron algorithm would be able to break it.
23
Variable delay buffers: Here we try to add interactions between the paths through the
circuit, as was the case with the lookup-table multiplexer implementation. This is done
by using what we call a variable delay buffer (see Figure 12). A component that directly
passes a signal from its input to its output, but that does so at a different speed depending
on the state of its other input. This approach is nice as it leads to continuous but nonmonotonic circuits (the delay of the circuit is a continuous function of the elementary
circuit delays, but the total delay might decrease as the delay of an elementary delay
increases). Moreover, by putting some effort into the delay characteristics of the variable
delay buffer, the circuit can be made hard to model in the additive delay model.
Change the circuit topology: So far we have only explored an extremely simple circuit
topology, there are scores of other possibilities left to explore. The three things to aim
for are reliability, variability and modeling difficulty. To achieve reliability one major
criterion is that the total circuit delay should vary slowly with the elementary component
delays. For variability (two chips with different delays have different responses), that same
variation should not be too slow. The hardest part is modeling difficulty; for now it is
essentially an open problem.
9.
Conclusion
In this paper we have presented a technique for delay-based circuit authentication and
conducted preliminary experiments that show that it is viable. By using this method, it is
possible to store secrets on a chip in a way that is less vulnerable to invasive attacks than
traditional digital methods.
Experimental results have shown that there is enough variations between integrated circuits
for identification purposes, and that the effect of temperature and power supply voltage
variations can be mitigated, allowing robust identification. More experiments are necessary
understand the effects of circuit aging on identification ability.
We have argued that delay-based authentication is not susceptible to conventional attacks
that attempt to discover a secret, hidden key. The most plausible attack we have found
against our devices is the model-building attack. Our experiments suggest that the current
implementation of our devices could be vulnerable to this type of attack, but we have suggested
a number of mechanisms that might prevent these attacks.
While a number of problems need to be solved in order to use delay-based authentication
in applications such as smart card authentication and software licensing, we believe that this
is a promising direction for future research.
REFERENCES
1. R. Anderson and M. Kuhn. Tamper Resistance - a Cautionary Note. In Proceedings of the Second Usenix
Workshop on Electronic Commerce, pages 111, Nov. 1996.
2. R. Anderson and M. Kuhn. Low Cost Attacks on Tamper Resistant Devices. In IWSP: International
Workshop on Security Protocols, LNCS, 1997.
24
3. R. J. Anderson. Security Engineering: A Guide to Building Dependable Distributed Systems. John Wiley
and Sons, 2001.
4. D. S. Boning and S. Nassif. Models of Process Variations in Device and Interconnect. In W. B.
A. Chandrakasan and F. Fox, editors, Design of High Performance Microprocessor Circuits, chapter 6.
IEEE Press, 2000.
5. D. Chinnery and K. Keutzer. Closing the Gap Between ASIC & Custom. Kulwer Academic Publishers,
2002.
6. F. Dartu and L. T. Pileggi. Calculating worst-case gate delays due to dominant capacitance coupling. In
Proceedings of the 34th annual conference on Design automation conference, pages 4651. ACM Press,
1997.
7. B. Gassend, D. Clarke, M. van Dijk, and S. Devadas. Controlled physical random functions. In Proceedings
of 18th Annual Computer Security Applications Conference, December 2002.
8. B. Gassend, D. Clarke, M. van Dijk, and S. Devadas. Silicon physical unknown functions. Technical
report, MIT LCS TR-833, May 2002.
9. B. Gassend, D. Clarke, M. van Dijk, and S. Devadas. Delay-based circuit authentication and applications.
In Proceedings of the 2003 ACM Symposium on Applied Computing, March 2003.
10. P. Kocher, J. Jaffe, and B. Jun. Differential Power Analysis. Lecture Notes in Computer Science,
1666:388397, 1999.
11. Y. Liu, S. R. Nassif, L. T. Pileggi, and A. J. Strojwas. Impact of interconnect variations on clock skew of
a gigahertz microprocessor. In Design Automation Conference. IEEE/ACM, June 2000.
12. K. Lofstrom, W. R. Daasch, and D. Taylor. IC Identification Circuit Using Device Mismatch. In
Proceedings of ISSCC 2000, pages 372373, February 2000.
13. A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone. Handbook of Applied Cryptography. CRC Press,
1996.
14. J.-J. Quisquater and D. Samyde. ElectroMagnetic Analysis (EMA): Measures and Counter-Measures for
Smart Cards. In Proceedings of Smart Card Programming and Security (E-smart 2001), LNCS 2140,
pages 200210, September 2001.
15. P. S. Ravikanth. Physical One-Way Functions. PhD thesis, Massachusetts Institute of Technology, 2001.
16. S. W. Smith and S. H. Weingart. Building a High-Performance, Programmable Secure Coprocessor. In
Computer Networks (Special Issue on Computer Network Security), volume 31, pages 831860, April 1999.
17. S. Theodoridis and K. Koutroumbas. Pattern Recognition. Academic Press, 2003.
18. N. Weste and K. Eshraghian. Principles of CMOS VLSI Design: A Systems Perspective. Addison Wesley,
1985.