Reverse Engineering Circuits
Using Behavioral Pattern Mining
Wenchao Li
UC Berkeley
[email protected]
Zach Wasson
UC Berkeley
[email protected]
ABSTRACT
UC Berkeley
[email protected]
blocks identified from the unknown circuit, and the results combined to generate the final high-level description.
In this paper, we present a formalization of the problem of reverse engineering a high-level description (henceforth referred to
as REHLD) of a digital circuit, along with an initial approach to
solve it. The context we address is that of a system integrator needing to reverse-engineer the functionality of a component she did
not design. Our problem formalization is based on two main insights. First, what constitutes a high-level component depends on
the view of the system integrator, but many standard hardware design patterns can be useful in the reverse engineering process. For
example, components such as arbiters, FIFOs, adders, and register files are common hardware design patterns that appear in many
designs. Similarly, standard protocols are often used in buses and
interconnection networks and the control logic for these structures
yield high-level FSM components. Second, the high-level components must be specified in an abstract, behavioral way, using mathematical, logical specifications. One can have many FIFO buffer
implementations, but all of them share the basic property of firstin, first-out data storage and transmission. Thus, an abstract component is really a specification of a family of concrete components,
all of which share a common set of properties.
We combine the formalism with an initial approach to systematically derive the high-level function of an unknown digital circuit,
given its gate-level netlist. The proposed approach is based on mining interesting behavioral patterns from the simulation traces of a
gate-level netlist, and represents them as a pattern graph. A similar
pattern graph is also generated for library components. Our method
first computes input-output signal correspondences via subgraph
isomorphism on the pattern graphs. The general function of the unknown circuit is then determined by finding the closest match in the
component library, by model checking the unknown circuit against
each logical specification.
To summarize, the contributions of this paper include:
• A formal framework for the reverse engineering problem, based
on the notion of matching against abstract library components
(Section 2), and
• An approach for matching an unknown sub-circuit against an
abstract component library based on mining behavioral patterns
from simulation or execution traces, followed by model checking (Section 3).
We demonstrate our approach on publicly-available benchmarks,
including implementations of the serial peripheral interface (SPI)
bus standard (Section 4).
Systems are increasingly being constructed from off-the-shelf components acquired through a globally distributed, untrusted supply
chain. The lack of trust in these components necessitates additional
validation of the components before use. Additionally, hardware
trojans are becoming a pressing concern. In this paper, we present
a novel formalism and method to systematically derive the highlevel function of an unknown circuit component given its gate-level
netlist. We define the high-level description of a circuit as an interconnection of instantiations of abstract library components characterized using logical specifications. The proposed approach is
based on mining interesting behavioral patterns from the simulation traces of a gate-level netlist, and representing them as a pattern
graph. A similar pattern graph is also generated for library components. Our method first computes input-output signal correspondences via subgraph isomorphism on the pattern graphs. The general function of the unknown circuit is then determined by finding
the closest match in the component library, by model checking the
unknown circuit against each logical specification. We demonstrate
the effectiveness of our approach on publicly-available circuits.
1. INTRODUCTION
Systems are increasingly being constructed from off-the-shelf components acquired through a globally-distributed, untrusted supply
chain. The lack of trust in these components implies that they must
be validated before use. Additionally, hardware trojans are becoming a pressing concern. One approach to address the problem, in
both cases, is to reverse-engineer the high-level functionality of the
circuit. In order to do this, there is a need for both a formalization of the problem – reverse engineering a high-level description
(HLD), as well as algorithmic techniques to derive a HLD from the
original circuit.
One of the challenges is simply to come up with a suitable formal problem definition. We restrict ourselves, in this paper, to
digital circuits. Several steps must be completed in order to define the reverse engineering problem and solve it. First, one must
define what “high-level” means. Intuitively, a high-level characterization of a circuit can be one that identifies the large functional blocks comprising the circuit, along with their interconnection. The space of possible functional blocks can be defined using
a library of high-level components, where a component is either
a commonly-used hardware design pattern or a custom finite-state
machine (FSM) block. Any circuit composed of such components
can be described in various ways, e.g., as a high-level netlist, or in a
suitable hardware description/programming language. Second, we
need a method of identifying candidate blocks in the unknown circuit to match against the high-level component library. Third, we
must compute a correspondence between input and output signals
of the unknown block and those of high-level components. Fourth,
given such a correspondence, we must formally define when an
unknown block is said to match a given high-level component.
Fifth, given this definition of component matching, we must verify
whether the unknown block indeed matches the high-level component. Finally, such a match must be computed for all candidate
c
978-1-4673-2340-6/12/$31.00 2012
IEEE
Sanjit A. Seshia
2. PROBLEM FORMULATION
We begin with some basic definitions and terminology in Sec. 2.1,
followed by the formal problem statement in Sec. 2.2.
2.1
Definitions and Terminology
2.1.1 Circuit and Traces
A bit-level netlist or circuit C is a tuple (I, O, VS , VC , Init, A)
where
83
• I is a finite set of input signals;
• O is a finite set of output signals;
• VS is a finite set of intermediate sequential (state-holding) signals;
• VC is a finite set of intermediate combinational (stateless) signals;
• Init is a set of initial states, i.e., initial valuations to elements of
VS , and
• A is a finite set of assignments to outputs and to sequential and
combinational intermediate signals. An assignment is an expression that defines how a signal is computed and updated.
All signals are Boolean. A combinational assignment is denoted by
s ← e, where s is a signal and e is a Boolean expression. Similarly,
a sequential assignment is denoted by s := e. Input and output
signals are assumed to be combinational, without loss of generality.
An input-output trace (or simply, a trace) of a circuit is a sequence
of valuations v0 , v1 , v2 , v3 , . . . to input and output signals; i.e.,
each vi is a vector of 0-1 values to variables in I ∪ O, and the
subscript i denotes the cycle at which the valuation is recorded.
For simplicity, we restrict ourselves to valuations of signals at the
rising edge of the clock; the results of the paper extend to other
conventions as well. A finite trace of a circuit is a finite sequence
of input-output valuations v0 , v1 , v2 , v3 , . . . , vk .
An event e is a tuple s, v , t, where s is a set of signals and v is
the corresponding valuations at cycle t. We denote the valuation of
a Boolean signal s at cycle t as vs,t .
Note that we do not define events as assignments of signals across
cycles, but this can be addressed by introducing suitable user-defined
events.
A delta event, denoted Δe, is an event such that at least one of its
constituent signals changes value from the previous valuation, i.e.
Δe := s, v , t such that ∃ s ∈ s, vs,t = vs,t−1 . For a Boolean
signal s, we use s0→1 to denote the delta event of s transitioning
from 0 to 1 and similarly s1→0 to denote the delta event of s transitioning from 1 to 0.
2.1.2 Formal Specification
A formal specification of a sequential circuit C is a set S of inputoutput traces of that circuit. Intuitively, every trace in S is an allowed behavior of C and every trace outside S is disallowed.
There are broadly two ways to write a formal specification for a
digital circuit. The automata-theoretic approach is to describe S
as a finite-state machine over infinite input sequences [22]. The
other approach is to write a logical formula (or set of formulas)
characterizing the input-output behavior of the circuit. The latter
approach has gained favor in the EDA community over the years,
especially in the form of assertion languages that allow one to specify temporal properties of a system, and which are usually slight
extensions of linear temporal logic (LTL) [18]. We also follow this
logic-based approach in our work.
A brief overview of LTL is provided below.
We can express that “every request must be eventually followed by
a grant” in LTL as “G (request ⇒ F grant)”, where the operator
G specifies that globally at every point in time a certain property
holds, and F specifies that a property holds either currently or at
some point in the future.
In this paper, we mainly write specifications using a combination
of LTL and regular expressions, employing the following four patterns:
1. Alternating (A) An alternating pattern between two delta events
Δa and Δb is true when each occurrence of Δa alternates with
an occurrence of Δb. Note that this does not mean Δb follows
Δa immediately in the next cycle. This pattern can be described
by the regular expression (Δa Δb)∗ . Figure 1 shows the corresponding finite automaton (self-transitions are such that the
84
automaton is deterministic).
2. Next (X) The next pattern corresponds to the LTL formula
“G (Δa ⇒ X Δb).” One can easily generalize this pattern to
fixed-delay pairs.
3. Until (U) The until pattern can be used to describe behaviors
such as “the request line stays high until a response is received.”
Figure 2 shows a trace where this pattern is satisfied. Formally,
the LTL formula is “G (a0→1 ⇒ X (a U b0→1 )).”
Figure 1: Finite Automaton for the Alternating Pattern
Figure 2: Request stays high until a response is received.
4. Eventual (F) The eventual pattern can be described by the LTL
formula “G (Δa ⇒ X F Δb).”
It is important to note that, even though we mainly focus on the
above temporal logic patterns, the problem formulation described
in this paper applies to all types of formal specifications.
2.1.3 High-Level Description
Informally, a high-level description of a circuit is a composition of
high-level abstract components.
An abstract component α is a triple (I, O, S), where I and O are
sets of input and output signals, respectively, and S is a formal
specification defining allowed input-output behavior of the circuit.
An instance of an abstract component α is any circuit that satisfies
the specification S of α.
We illustrate the notion of an abstract component using an example.
E XAMPLE 1. Consider an arbiter servicing two (input) request
lines r0 and r1 with two grant lines g0 , g1 as outputs. The abstract
arbiter component would comprise the following LTL properties:
G ¬(g1 ∧ g2 )
[G F ¬(r0 ∧ r1 )] ⇒ G (r0 ⇒ F g0 )
[G F ¬(r0 ∧ r1 )] ⇒ G (r1 ⇒ F g1 )
The first property states that both requests cannot be granted at the
same cycle. The last two properties state that a request must eventually be granted, provided there are infinitely many cycles where
no competing requests are present — the latter assumption is the
property G F ¬(r0 ∧ r1 ).
A library of abstract components (component library, for short)
L is a set {α1 , α2 , . . . , αn } of abstract components. We will assume that each abstract component is also accompanied by at least
one concrete instance; this is reasonable, as the abstract component
library is typically constructed from observations of commonly occurring components in hardware designs.
E XAMPLE 2. Examples of abstract components include common hardware design patterns and modules such as an arbiter,
2012 IEEE International Symposium on Hardware-Oriented Security and Trust
FIFO buffer, adder, multiplier, content-addressable memory (CAM),
and crossbar. Abstract descriptions of finite-state machines are relevant for circuits that implement protocols; e.g., transmitter or receiver modules implementing the Ethernet or I2 C protocols.
A high-level description (HLD) of a circuit C with inputs I and output O is a tuple (I, O, Γ) where Γ is a set of instances of abstract
components drawn from L such that the synchronous composition
of these instances is (sequentially) equivalent to the original netlist
C.
E XAMPLE 3. Consider the HLD of a chip multiprocessor (CMP)
router as shown in Figure 3. It is a composition of four high-level
modules. The input controller comprises a set of FIFOs buffering
incoming flits and interacting with the arbiter. When the arbiter
grants access to a particular output port, a signal is sent to the input controller to release the flits from the buffers, and at the same
time, an allocation signal is sent to the encoder which in turn configures the crossbar to route the flits to the appropriate output port.
Input Controller
req1
Arbiter
alloc
3.1
Encoder
resp1
select
Buffers
flit0
in1
flit1
out0
Crossbar
out1
Figure 3: CMP Router comprising four high-level modules
2.2
3. APPROACH
In this section, we give a detailed description of our approach based
on behavioral pattern mining and model checking.
req0
resp0
in0
In the rest of this paper, we address some of the key steps in the
above process. Step 1, library definition, depends on the requirements of the end user of the reverse engineering procedure. We use
a representative sample of communication protocols in our experiments. In general, a burden is placed on the end user to create an
initial component library. While this is a limitation for any librarybased approach, common hardware design patterns [8] and standardized interfaces are ideal starting components for the library.
We do not address Step 2 (functional block identification) in this
paper, leaving it to future work. The major component addressed
by the remainder of the paper is Step 3. Specifically, for 3(a), we
show how temporal patterns mined from simulation or execution
traces of the circuits can be used to determine input-output correspondence. For 3(b), we use model checking [7] to determine
whether a given bit-level circuit satisfies the formal specification
associated with an abstract component.
Problem Definition
We are now ready to formally define the problem of reverse engineering a high-level description (REHLD) of a circuit.
D EFINITION 1. Given
• an unknown bit-level circuit C = (I, O, VS , VC , Init, A), and
• a library L = {α1 , α2 , . . . , αn } of abstract components,
the REHLD problem for C is to derive a high-level description C ′ =
(I, O, Γ) where C ′ is equivalent to C and Γ is a composition of
instances of abstract components from L.
As noted in Section 1, solving the REHLD problem involves multiple steps. We rephrase these steps using the notation introduced
in this section. There are three main steps:
1. Library definition: Constructing an abstract component library
L;
2. Functional block identification: Dividing the given unknown
circuit C into a set of functional blocks (sub-circuits) b1 , b2 , . . . , bk ,
where each bi is considered a candidate for matching against the
component library L;
3. Matching against component library: Given a candidate functional block (sub-circuit) bi and an abstract component library L
= {α1 , α2 , . . . , αn }, determine whether there exists an abstract
component αj such that:
(a) Input-output correspondence: Each input (respectively, output) signal of αj that appears in the formal specification of
αj is mapped 1-1 to some input (respectively, output) signal
of bi .
(b) Verification: bi is an instance of αj ; i.e., bi satisfies the specification Sj associated with αj .
Input-Output Correspondence
Given an unknown sub-circuit C and an abstract library component α, we must first compute the correspondence between their
input and output signals, if one exists. A brute-force approach will
have to try all possible permutations of the signals. In addition, the
numbers of inputs and outputs of the two circuits may not be identical. We use a heuristic procedure for this step. First, a concrete
instance of α (e.g. a reference circuit that satisfies that formal specification of α), denoted C ′ is obtained. Then, given the two circuits
C with interface signals VI = I ∪ O and C ′ with interface signals
VI ′ = I ′ ∪ O′ , our method tries to find the corresponding signals
between VI and VI ′ . This problem is formally defined as follows.
D EFINITION 2. Given two sets of signals A and B, the signal
correspondence problem is to find a bijective mapping σ : Ā → B̄,
where Ā ⊆ A and B̄ ⊆ B. We say the mapping is maximum
if there does not exist another mapping σ ′ : Ã → B̃ such that
Ā ⊂ Ã ⊆ A or B̄ ⊂ B̃ ⊆ B.
Our approach to solving the signal correspondence problem is based
on mining patterns from a set of input-output traces of the two circuits. Other approaches, e.g., based on the structure of the circuits,
are also possible, and will be left to future work.
The key insight in our approach is that two signals are similar if
they exhibit similar behaviors in relation to other signals. In this
paper, we measure the similarity of two signals by checking if they
satisfy some particular patterns (in relation to other signals) in the
traces. However, our framework is general – one can use other
definitions of similarity such as statistical measures.
In a nutshell, our method uses a two-step combination of pattern
mining and graph matching. The pattern mining step infers likely
temporal properties of a circuit from input-output traces of that circuit. It is important to note here that for the unknown component,
we can only observe the trace induced by a test bench at the chiplevel. This means that it is not possible to control the simulation as
in the case of the known component. Consequently, even if the unknown circuit is identical to the library component, the behaviors
of the two traces can be very different. In this work, we use pattern
mining as a way to concisely capture key features of a trace. The
mined properties are represented in terms of a pattern graph. Given
circuits C and C ′ , we compute pattern graphs for each of them.
Then, a graph matching procedure is used to compute the maximum common subgraph between these two graphs, which yields
the desired maximum bijection. We elaborate below.
2012 IEEE International Symposium on Hardware-Oriented Security and Trust
85
3.1.1
Pattern Mining
Given a set of input-output traces and a pattern template as described in Section 2.1.2, it first generates a pattern graph G =
V, E where E ⊆ V × V . A vertex v ∈ V represents a delta event
of some signal in I ∪O. For example, a vertex labeled with Δa0→1
represents the event of signal a transitioning from 0 to 1. There is
a directed edge e = (u, v) ∈ E if and only if the pattern involving
the delta events represented by u and v is satisfied by all traces.
For example, given the pattern template “G (Δa ⇒ X (a U Δb)),”
there is an edge from a vertex labeled with Δa0→1 to a vertex labeled with Δb0→1 if and only if the pattern is satisfied by the set of
traces.
As an example, let α be the arbiter described in Section 2.1.3. Suppose C uses a round-robin priority scheme and C ′ uses a fixed priority scheme for arbitration.
Given the input-output traces of C and C ′ and the Until pattern,
suppose that Figure 4 shows the pattern graphs generated for C and
C′. 1
Consider our arbiter example. Figure 5 shows a maximum common subgraph of the pattern graphs given in Figure 4. Hence, all
Figure 5: MCS in the two arbiter pattern graphs
the request and response signals are mapped correctly using MCS
approach.
The MCS problem is known to be NP-hard [10]. Most complete
approaches are based on reformulating the problem into a maximum
clique problem in a compatibility graph between G and G′ [9].
The compatibility graph is a product graph of G and G′ such that a
vertex in the product graph is a pair of vertices (i, k) where i ∈ V
and k ∈ V ′ . There is an edge from (i, k) to (j, l) (i = j and k = l)
if and only if one of the following conditions hold:
• (i, j) ∈ E and (k, l) ∈ E ′ ;
(a) Round-robin priority
(b) Fixed priority
Figure 4: Arbiter Pattern Graphs
Table 1 shows how the named signals a-d of the round-robin priority arbiter and w-z of the fixed-priority arbiter correspond to request/grant signals r0 ,r1 ,g0 , and g1 as described in Sec. 2.1.3. (Our
task is to compute this correspondence, but we provide it here so
that the reader can follow this example.) Thus, for example, the
edges from a0→1 to c0→1 and from w0→1 to y0→1 are instances
of the property G [r00→1 ⇒ X (r0 U g0 0→1 )] (request stays high
until the corresponding grant is asserted).
Table 1: Signal Names in Arbiter Versions
Signals Round-robin Fixed
r0
a
w
b
x
r1
g0
c
y
g1
d
z
3.1.2
Graph Matching
A graph G′ = V ′ , E ′ is an induced subgraph of G = V, E if
V ′ ⊆ V and E ′ = E ∩ (V ′ × V ′ ). G is said to be isomorphic
to G′ if there exists a bijective function f : V → V ′ such that
∀ (u, v) ∈ V × V, (u, v) ∈ E ⇐⇒ (f (u), f (v)) ∈ E ′ .
Given the two pattern graphs G and G′ corresponding to C and C ′
respectively, a common subgraph is a graph which is isomorphic
to induced subgraphs of G and G′ . We wish to find a maximum
common subgraph (MCS) between the two graphs, i.e., a common
subgraph between G and G′ that has the maximum number of vertices.
The key observation here is that the bijective function f that defines
the MCS is exactly the signal correspondence mapping σ that we
are looking for. In fact, since the vertices represent delta events,
our approach can identify corresponding signals even if they are
implemented with opposite polarities in the two circuits.
1
In the case where a pattern graphs is composed of multiple disconnected subgraphs, we use the biggest subgraph as the pattern
graph.
86
• (i, j) ∈
/ E and (k, l) ∈
/ E′.
In a directed graph G, we say two vertices u and v are connected
if both (u, v) ∈ E and (v, u) ∈ E. A clique is then a subset of
vertices such that every pair of vertices in this set are connected. A
maximum clique is a clique with the most number of vertices.
We omit details of how the maximum clique problem is solved but
refer the readers to [6]. The technique can scale to graphs with hundreds of vertices. Solving the maximum clique problem correctly
generates the signal correspondence as depicted in Table 1.
3.2
Matching by Verification
If all inputs and outputs of the abstract component α appearing in
its specification S are mapped onto some inputs and outputs of the
unknown circuit C, then we proceed to the next step, where we
verify whether C satisfies S. In our approach, we perform this step
using model checking [7], as S is a set of LTL properties. (If S
were a reference circuit, it is possible to replace the model checker
with a sequential equivalence checker.) If C satisfies S, then we
terminate reporting that C matches the component α. Otherwise,
we report that it does not match (and move to trying to match C to
a different abstract component).
However, if some inputs/outputs of α appearing in S are not matched
to inputs/outputs of C, then we stop and declare that there is no
match between C and α. Note that this approach is conservative:
it is possible for C to be an instance of α but not pass this phase,
since our input-output signal correspondence algorithm is heuristic. However, it is important to note that our matching approach is
sound due to the use of formal verification.
While formal verification may not scale to the full circuit, we envisage applying our procedure mainly to blocks with hundreds to
thousands of flip-flops, which is within the capacity of state-of-theart model checkers today. We demonstrate our approach on benchmarks from OpenCores, as discussed in the following section.
Lastly, it should be noted that while it may be difficult to write
specifications that account for all possible behaviors of each library
component, it is generally possible to distinguish one design block
from another using only a few logical specifications (e.g. distinguish an arbiter from an adder). We believe the automation that we
provide in the proposed approach can benefit reverse-engineering
greatly, as it is still mostly a manual process today.
2012 IEEE International Symposium on Hardware-Oriented Security and Trust
4. RESULTS
We used the following three circuits obtained from OpenCores [2]
in our experiments.
• WISHBONE-compatible SPI (WB-SPI) [5]
• WISHBONE-compatible SimpleSPI (WB-SimpleSPI) [3]
• WISHBONE-compatible I2 C (WB-I2 C) [1]
The serial peripheral interface (SPI) is a full duplex, serial communication link. Devices operate in either master or slave mode.
Typically there is a single master device and one or more slave devices. SPI specifies four logic signals: SCLK (serial clock), MOSI
(master output/slave input), MISO (master input, slave output), and
SS (slave select). The SS signal is only necessary if more than one
slave device is connected to the master. To initiate a data transfer,
the SS pin for the desired slave is first pulled low. Then data are
clocked from the master to the slave via the MOSI port and data
are clocked from the slave to the master via the MISO port. When
the transfer is complete, the SS pin is pulled high. SimpleSPI is a
simplified version which supports only one master and one slave.
The inter-integrated circuit or I2 C is a serial two-wire communication bus. Devices operate in either master or slave mode, similar to
SPI; however, there can be multiple masters on an I2 C bus. The bus
is made up of two logic signals: SCL (clock) and SDA (data). A
typical data transfer starts with a master sending a START bit along
with a 7-bit address for the slave it wishes to communicate with and
a bit indicating a read or write operation. The slave responds with
an ACK bit and proceeds to operate in either read mode or write
mode, depending on the master’s request. Once the transmission is
over, the master sends a STOP bit.
WISHBONE is a communication interface for IP cores that enhances design reuse by enforcing compatibility between cores. Furthermore, WISHBONE is open source, which makes it easy for
engineers to share hardware designs. As such, many projects on
OpenCores, a website dedicated to open source hardware designs,
include a WISHBONE interface. The protocol supports handshaking, single read/write cycles, block read/write cycles, and readmodify-write cycles. All three circuits are supposed to be WISHBONE compatible.
We consider the scenario where the WISHBONE protocol [4] has
been pre-characterized as a library component and the WB-SPI
circuit is given as a concrete implementation of the WISHBONE
protocol. We treat the WB-SimpleSPI and WB-I2 C as unknown
candidate functional blocks. The goal is to determine whether an
unknown circuit also implements the WISHBONE interface.
Signal Correspondence:
We followed the approach in [13] for generating the pattern graph
by mining the Until pattern on the simulation traces of each circuit.
The simulation traces were produced by the test benches provided
on OpenCores. We further assume the clock, reset (wb_rst_i) and
data-path signals at the interface are already identified (these are
easy to identify by structural methods). Next, a compatibility graph
was generated for the pattern graph of the unknown circuit and WBSPI.2 The time taken to generate each pattern graph was under a
second. We used the program Cliquer [17] which implements an
exact branch-and-bound algorithm developed by Patric Östergård
for finding the maximum clique in a graph. The time taken to find
a maximum clique in the compatibility graph was under a second
in all instances.
Table 2 shows the signal map derived between SPI and SimpleSPI
by using the first MCS produced. All the WB-related signals were
mapped correctly. In addition, two SPI-related signals were also
mapped correctly. The signal names used here were taken from
the respective RTL files and only serve as an illustration. In an actual reverse engineering exercise, the signal names of the unknown
2
Observe that the pattern graph only needs to be generated once for
the library component.
circuit will be arbitrary. These results were obtained despite the
fact that the traces were generated by two test benches with very
different behaviors.
Table 2: Signal Mapping between WB-SPI and WB-SimpleSPI
WB-SimpleSPI
WB-SPI
miso_pad_i
miso_i
wb_ack_o
ack_o
sclk_pad_o
sck_o
we_i
wb_we_i
wb_cyc_i
cyc_i
wb_stb_i
stb_i
Table 3 shows the signal map derived between SPI and I2 C by also
using the first MCS produced.
Table 3: Signal Mapping between WB-SPI and WB-I2 C
WB-SPI
WB-I2 C
wb_we_i
wb_we_i
wb_cyc_i wb_cyc_i
wb_ack_o wb_ack_o
All the WB-related signal were again matched correctly. However,
the wb_stb_i signal was not matched. It was identified by iterating the approach with the Alternating pattern, given the current
matches. This suggests an incremental approach to our framework
but this will be left to future work. On the other hand, there were
also four other incorrectly matched signals. This is because other
than being both WISHBONE compatible, the two circuits were
actually implementing different functions. The next step which
checks these signals against their logical specifications will eliminate the incorrect matches.
Matching by Verification:
We focused on specifying the slave interface of the WISHBONE
protocol, although we used properties of the master interface as
assumptions.
At a minimum, a slave interface requires the following signals:
wb_ack_o, wb_clk_i, wb_cyc_i, wb_stb_i, and wb_rst_i. Since
the WISHBONE interface does not explicitly require the data lines,
wb_dat_o and wb_dat_i, the only properties we could specify
were about the reset operation and the handshaking protocol.
The properties for the reset operation and handshaking protocol are
as follows:
• G (wb_rst_i ⇒ X ¬wb_ack_o)
• G ¬(wb_ack_o ∧ X wb_ack_o)
• G ((wb_cyc_i ∧ wb_stb_i) ⇒ F wb_ack_o)
The assumptions we made on the master interface, part of the specification, are as follows:
• G (wb_rst_i ⇒ (¬wb_stb_i ∧ ¬wb_cyc_i))
• G (wb_stb_i ⇒ wb_cyc_i)
These LTL properties were manually translated from the WISHBONE documentation Rev. B.4 [4].
Taking the netlist descriptions of WB-SimpleSPI and WB-I2 C, we
translated them to the SMV format and verified the properties described above using the Cadence SMV model checker [14]. For
both circuits, all the properties passed. This confirmed that both
indeed implemented the WISHBONE interface. The verification
times were under 0.1 second for either benchmark.
5. RELATED WORK
Digital system designers usually proceed from a high-level description to a gate-level netlist, and then to a physical layout and mask;
it is rare to proceed in the opposite direction. However, as noted
in Sec. 1, the study of reverse engineering of digital circuits has
been gaining importance in recent years. We review here the most
closely related work.
2012 IEEE International Symposium on Hardware-Oriented Security and Trust
87
Hansen et al. [11] present a study of reverse engineering the wellknown ISCAS-85 combinational circuits. They present several strategies, mostly manual, to reverse engineer circuit functionality from
a gate-level schematic. Some of these include looking for common
library components, repeated structures, computing truth tables of
small blocks, and identifying bus structures and control signals.
However, they do not formally characterize the problem. To the
best of our knowledge, ours is one of the first formal definitions
of the reverse engineering problem. The component library in our
work is at a much higher level of abstraction than that suggested by
Hansen et al. Moreover, our component matching is automated and
operates on sequential circuits.
Torrance and James [23] describe the practice of reverse engineering semiconductor-based products. Their approach includes product tear-downs (stripping packaging and disassembling the unit),
“system-level analysis” (identifying components on a board and
performing functional analysis through probing), process analysis, and circuit extraction (deriving a schematic from a stripped
IC). Our work is complementary to this effort. Once a gate-level
schematic is derived, our techniques can be applied to match modules within the schematic to an abstract component library.
Our work does not address reverse engineering of arbitrary finitestate machine functionality. While any sequential circuit can be
trivially viewed as a monolithic FSM, the challenge is to be able to
decompose that FSM into a set of smaller FSMs, each of which performs a distinguishable function, thus making the resulting highlevel description easier for a human to understand. The recent work
by Shi et al. [20] is a step in this direction.
A key component of our work is to find the input-output signal
correspondence between an abstract component and a block in the
unknown circuit. There is not very much prior work in this area.
Mohnke and Malik [15] present a BDD-based approach for comparing two combinational circuits whose input correspondence is
not known apriori. The authors also extended their idea to finding latch correspondence for sequential circuits, by considering the
combinational circuit computing the next-state function [16]; this
does not address our problem, though, as sequential equivalence is
required between the two circuits. Our approach, based on mined
temporal properties and graph matching, is novel.
Our technique addresses the REHLD problem for a system integrator who has not designed the circuit being reverse-engineered, but
instead needs to verify its functionality prior to integration. We do
not address the problem of untrusted manufacturing and IC piracy,
where the designer is trusted, which can be tackled by techniques
such as EPIC [19]. Our technique is complementary to other recent work on malicious trojan circuit detection (e.g., [12, 21]). We
do not seek to find trojans, instead focusing on detecting if a subcircuit exhibits correct behavior which is captured by a set of logical specifications; if our approach deems a sub-circuit to match
an abstract component, it is guaranteed to do so due to the use of
formal verification. In addition, if the sub-circuit violates some
security-related specification, our approach will report that as well.
6. CONCLUSION
We presented a new formal definition of the problem of reverse engineering a high-level description of an unknown digital circuit. A
solution strategy was sketched out and a new technique proposed
for the key step of matching a block in the unknown circuit to an abstract component library. Our technique is based on a combination
of pattern mining from input-output traces and model checking.
Experimental results demonstrate the promise of this approach.
There are several directions for future work. Structural techniques
can complement our behavioral approach to input-output signal
correspondence. New methods must be devised to find candidate
functional blocks in the overall circuit to match against the component library. Additionally, to complete the reverse engineering
88
process, one would need to integrate the block identification and
matching procedures into an iterative loop, that finds the best “covering” of the unknown circuit with abstract high-level components.
Acknowledgements. This work was supported in part by the Defense Advanced Research Projects Agency (DARPA) under the IRIS
program, and by the Hellman Family Faculty Fund.
7. REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
I2c controller core. http://opencores.com/project,i2c.
Opencores. http://opencores.com/.
Simple spi core. http://opencores.com/project,simple_spi.
Soc interconnection: Wishbone.
http://opencores.org/opencores,wishbone.
Spi core. http://opencores.com/project,spi.
E. Balas and C. S. Yu. Finding a maximum clique in an
arbitrary graph. SIAM J. Comput., 15:1054–1068, November
1986.
E. M. Clarke, O. Grumberg, and D. A. Peled. Model
Checking. MIT Press, 2000.
A. DeHon, J. Adams, M. DeLorimier, N. Kapre, Y. Matsuda,
H. Naeimi, M. Vanier, and M. Wrighton. Design patterns for
reconfigurable computing. In Field-Programmable Custom
Computing Machines, 2004. FCCM 2004. 12th Annual IEEE
Symposium on, pages 13 – 23, april 2004.
P. J. Durand, R. Pasari, J. W. Baker, and C. Tsai. An efficient
algorithm for similarity analysis of molecules. Internet
Journal of Chemistry, 1999.
M. R. Garey and D. S. Johnson. Computer and intractability.
Freeman, 1979.
M. C. Hansen, H. Yalcin, and J. P. Hayes. Unveiling the
ISCAS-85 benchmarks: A case study in reverse engineering.
IEEE Design & Test of Computers, 16(3):72–80, 1999.
M. Hicks, M. Finnicum, S. T. King, M. M. K. Martin, and
J. M. Smith. Overcoming an untrusted computing base:
Detecting and removing malicious hardware automatically.
In IEEE Symposium on Security and Privacy, pages
159–172, 2010.
W. Li, A. Forin, and S. A. Seshia. Scalable specification
mining for verification and diagnosis. In Design Automation
Conference, pages 755–760, 2010.
K. McMillan. The cadence smv model checker.
http://www.kenmcmil.com/smv.html.
J. Mohnke and S. Malik. Permutation and phase independent
boolean comparison. In European Conference on Design
Automation, Feb. 1993.
J. Mohnke, P. Molitor, and S. Malik. Establishing latch
correspondence for sequential circuits using distinguishing
signatures. In MidWest Symposium on Circuits and Systems,
pages 472–476, 1997.
S. Niskanen and P. Östergård. Cliquer - routines for clique
searching. http://users.tkk.fi/pat/cliquer.html.
A. Pnueli. The temporal logic of programs. In 18th Annual
Symposium on Foundations of Computer Science (FOCS),
pages 46–57, 1977.
J. A. Roy, F. Koushanfar, and I. L. Markov. EPIC: Ending
piracy of integrated circuits. In Proc. Design, Automation
and Test in Europe (DATE), pages 1069–1074, 2008.
Y. Shi, C. W. Ting, B.-H. Gwee, and Y. Ren. A highly
efficient method for extracting FSMs from flattened
gate-level netlist. In IEEE International Symposium on
Circuits and Systems (ISCAS 2010), pages 2610–2613, 2010.
C. Sturton, M. Hicks, D. Wagner, and S. T. King. Defeating
UCI: Building stealthy and malicious hardware. In IEEE
Symposium on Security and Privacy, pages 64–77, 2011.
W. Thomas. Automata on infinite objects. In Handbook of
Theoretical Computer Science, pages 133–164. Elsevier,
1990.
R. Torrance and D. James. The state-of-the-art in IC reverse
engineering. In 11th International Workshop on
Cryptographic Hardware and Embedded Systems (CHES),
volume 5747 of Lecture Notes in Computer Science, pages
363–381, 2009.
2012 IEEE International Symposium on Hardware-Oriented Security and Trust