Smart Card SCA 2
Smart Card SCA 2
Smart Card SCA 2
Examinatiom Committee
Chairperson: Prof. Luı́s Manuel Antunes Veiga
Supervisor: Prof. Ricardo Jorge Fernandes Chaves
Member of the Committe: Prof. Renato Jorge Caleira Nune
May 2017
Acknowledgments
First of all, I would like to thank my coordinator, Professor Ricardo Chaves, since
its advice helped
me keep the work in the right direction and overcome some of the difficulties
encountered.
I would like to thank my colleague Ricardo Maçãs, for reviewing and providing some
feedback
about my thesis, and Jaganath Mohanty for having introduced me and explained the
setup components.
I would like to acknowledge WB electronics for providing the source code example
for the smart
card.
I would like to thank my family, especially my parents, for the constant support.
All of this, from
start to end, would have not been possible without them.
Last but not least, I would like to thank my girlfriend Joana Neno for its support
and patience
specially on the tough days.
iii
Abstract
Smart cards are ubiquitous devices used in many critical areas. They offer some
mechanisms
against unauthorized access, that protects the secret data in it. They can also
offer cryptographic
operations such as data protection and authentication. However, power analysis
offers non-intrusive
techniques to extract sensitive information. The most common attacks used is the
Differential Power
Analysis (DPA), an attack class most appropriate for symmetric ciphering like AES.
Signal-to-noise
ratio was used as complementary analysis to assess the noise on the recorded power
traces respectively. Then this work presents the fundamentals of CPA, the state of
the art, and other statistical
analysis algorithms. Using this, an experimental setup is purposed to perform this
type of analysis of
smart cards and FPGA. Finally, an experimental evaluation is performed to assess if
the setup can
improve with the use of external amplification and different power supplies. The
results shown that
this type of setup can benefit from external amplification, but did not benefit as
much when different
power supplies were used. Also, the secret key of an unprotected smart card, using
the best setup
configuration, was fully recovered using 25 power traces.
Keywords
Side-channel, Power analysis, Smart cards
v
Contents
1 Introduction
xiii
1.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . xiv
1.2 Thesis
Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . xv
1.3
Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . xv
1.4 Document
Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. xv
2 Background
xvii
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xxxii
xxxv
xliii
4.1 Processing
Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. xlv
4.1.1
SAKURA-G/W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. xlv
4.1.2 Smart
Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xlvi
4.2 Trace Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . xlix
4.2.1 PicoScope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . xlix
4.2.2 Programming the traces collecting program . . . . . . . . . . . . . . . . . .
. . .
li
li
lvii
5.1 Different Setup
Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
lviii
5.1.1 Signal-to-noise Ratio
Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . lviii
5.1.2 CPA
Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
lx
lxix
viii
lxxiii
List of Figures
2.3 Smart card contact pads . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . xx
2.4 Smart card Application Protocol Data Units (APDU) format. . . . . . . . . . . .
. . . . . xxi
2.5 Smart card communication protocol . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . xxii
2.6 AES
AddRoundKey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . xxv
2.7 AES SubBytes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . xxv
2.8 AES
ShiftRows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . xxv
2.9 AES MixColumns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . xxv
2.10 Differential Power Analysis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxviii
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
3.1 SAKURA-
G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . xxxvi
3.2 SAKURA-
W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . xxxvii
3.3 Chipwisperer Starter
Kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii
3.4 Chipwisperer Software Interface . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . xxxviii
4.1 Power analysis
setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xliv
4.2 PicoScope
6000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
lv
ix
Abbreviations
AC Alternating Current
AES Advanced Encryption Standard
APDU Application Protocol Data Units
API Application programming interface
ASIC Application Specific Integrated Circuits
ATR Answer To Reset
BPS Bits Per Second
CD Carrier Detect
CMOS Metal-OxideSsemi Conductor
COS Card Operating System
CPA Correlation Power Analysis
CPU Control Process Unit
CRC Cyclical Redundancy Check
CTS Clear to Send
DCE Data Communication Equipment
DC Direct Current
DDR Data Direction Register
DES Data Encryption Standard
DLL dynamic link library
DPA Differential Power Analysis
DSR Data Set Ready
DTE Data Terminal Equipment
xi
EEPROM Electrically-Erasable Programmable Read-Only Memory
FIA Fault Injection Attacks
FPGA Field-Programmable Gate Array
HAL Hardware Acceleration Engine
HD Hamming-Distance
HW Hamming-Weight
I/O Input/Output
ISO/IEC International Organization for Standardization and International
Electrotechnical Commission
ISO International Organization for Standardization
MIPS Millions of Instructions Per Second
MMU Memory Management Unit
OS Operating System
PAA Power analysis attacks
POI Points-Of-Interest
RAM Random-access memory
RI Ring Indicator
ROM Read Only Memory
RSA Rivest Shamir Adleman
SCA Side-Channel Attacks
SIM Subscriber Identity Module
SNR signal-to-noise ratio
SPA Simple Power Analysis
SRAM Static Random Access Memory)
XOR Exclusive-or
xii
1
Introduction
Contents
1.1
1.2
1.3
1.4
Motivation . . . . .
Thesis Goals . . . .
Requirements . . .
Document Structure
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. xiv
. xv
. xv
. xv
xiii
The smart card has become very popular over the years, being used in many security
systems.
They can safely store sensitive information, like secret keys, thanks to built in
security mechanisms
making them tamper-resistant. Smart cards can also perform cryptographic operations
using the
secret key they hold, meaning the secret key is never exposed.
Side-channel attacks [1], have proven to be a very effective mean to attack
cryptographic algorithms. Side-channel exploits sensitive information leaked by a
device during operation, ultimately
compromising the secret information of the cryptographic system.
Paul Kocher has proven in his pioneering work [2] that a smart card can be
compromised easily, if
adequate protection mechanisms against power analysis attacks are not deployed.
Power analysis is
a type of side-channel attack, where the instantaneous power consumption of a
device can be used
to compromise the secret it holds. This is possible since the consumption of a
device depends on the
data and operations that are being processed.
The two main power analysis attacks used are Simple Power Analysis (SPA) and
Differential Power
Analysis (DPA). SPA relies on the direct interpretation of a power trace, used
mainly to retrieve information about the operations being performed, but can also
be used to retrieve sensitive information in
particular cases. DPA is a more elaborate attacker, relying on the statistical
analysis of multiple power
traces. In general, the attacker needs little to no implementation details about
the device under attack
[3].
1.1
Motivation
Smart cards are distributed worldwide being used in many of today’s industries.
Because they are
so widely spread and carry sensitive information they are a very desirable target
for attackers. Smart
cards have multiple mechanisms that detect, when the card is working under abnormal
conditions, or
when attempts to probe or tamper the components are made [4]. Side-channel attacks
can retrieve
information from smart cards by measuring power consumption, electromagnetic
fields, timing or even
sound [5]. These unintended leaks of sensitive information that might look
harmless, but ultimately
can compromise a device’s secret information.
Power analysis attacks (PAA) are of interest because they do not need to tamper
with the normal
functionality of the device. They measure the power consumption during the device’s
operation and
perform statistical analysis over the collected data, defeating smart card security
mechanisms that did
not contemplate this type of attack being more focused, for example, on protecting
the device against
physical access. Power Analysis Attacks have been proven over the years to be very
effective on
cryptographic devices, according to the existing state of the art research [1, 6–
9].
The need to understand and measure the effectiveness of these attacks on devices
that hold
sensitive information as smart cards is huge. The improvement of these attacks and,
at the same
time, the implementation of countermeasures is the way to keep one step ahead of
attackers.
xiv
1.2
Thesis Goals
In order to know how cryptographic devices can be exploited with the use of power
analysis,
it is required to find how this type of attacks works to understand why and what
devices might be
vulnerable.
The goal with this work is to build a setup that allows one to perform this type of
analysis. The
analysis should allow the recovery of the secret key from an unprotected smart card
implementation
and access the quality of the power traces gathered.
This setup will focus mainly in supporting smart cards, but can also be extended to
other devices
as Field-Programmable Gate Array (FPGA)s. Then, different setup configurations will
be tested in
order to find out how they affect the power trace collection and analysis.
Finally, this document is intended to serve as a stepping-stone for those who wish
to understand
the concepts of power analysis and need to perform their own experiments and
evaluate their own
systems.
1.3
Requirements
1.4
Document Structure
This document describes how power analysis is able to retrieve a secret key from a
cryptographic
device and how it can be done in practice.
Chapter 2 provides the base information to understand why power analysis is
possible and what
type of statistical analysis can be performed, in order to retrieve information
from the power recorded
power consumptions.
Chapter 3 presents the state of the art on power analysis platforms, attacks that
improve key
recovery base attack and some protection mechanisms that can increase the
difficulty of recovering
the key.
xv
Chapter 4 describes the proposed setup and discusses its implementation. Here the
trace collecting setup components are presented as well all the software developed
to support the overall
setup.
Chapter 5 presents the evaluation of the proposed solution. The several setup
configurations
are evaluated by comparing the required traces. Several evaluation methods and
related statistical
algorithms are also compared to see their effectiveness.
Chapter 6 concludes the dissertation by summarizing the work developed and presents
possible
work directions
xvi
2
Background
Contents
2.1
2.2
2.3
2.4
2.5
Smart Cards . . . . . . . . . . .
Cryptographic Algorithms . . .
Side-Channel Analysis Attacks
Signal Characteristics . . . . .
Welch’s T-Test . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xviii
xxiii
xxvi
xxx
xxxii
xvii
This chapter presents the background, for a better understanding of later chapters.
Section 2.1
presents an introduction of the smart card characteristics and components. Section
2.2 presents 2
types of ciphering algorithms: The Advanced Encryption Standard (AES) and Rivest
Shamir Adleman
(RSA). Section 2.3 explains how power analysis work, followed by two techniques to
perform such
analysis. Section 2.4 presents the characteristics to take into account when
gathering signals and
explains the components of power traces. Finally, section 2.5 presents a
complementary analysis
technique named t-test.
2.1
Smart Cards
Plastic cards have been in use since the 50’s. The first security mechanism relied
on visual features of the card, such as security printing and signature panel. The
successor was the magnetic
stripe card that allowed the storage of digital data, enabling the card to be read
by machines. The
problem with this technology is the possibility to read, delete and re-write the
data store in the magnetic strip with the right equipment.
With the creation the integrated circuit and subsequently inclusion on a plastic
card, the term smart
card was born. A smart card is a card shaped device with an integrated circuit that
provides a way
to store data securely. There are two main smart card types: The memory and
microcontroller smart
cards [10].
Memory cards include dedicated logic for security, providing access control to
data, for example,
a writing/erase protection mechanism. This type of cards is designed for one
specific purpose which
restricts their flexibility. However, this makes them inexpensive to manufacture.
They are used in
applications that need storage and minimal data protection, being used in health
insurance cards.
Microcontroller cards can be seen as a miniaturized computers with a Operating
System (OS), a
storage, memory and an Input/Output (I/O) port. They also have the ability to
create, delete, manipulate files and process data. This gives them the ability to
execute applications and perform functionalities dynamically, offering at the same
time secure application transactions and data protection. A
great security advantage they have is the ability to perform cryptographic
operations inside the card,
meaning the secure data is never exposed.
Smart cards communicate with the external world using physical contacts or
electromagnetic
fields. Hybrid smart cards also exist, meaning that they possess the two means of
communication
[10].
2.1.1
Physical Characteristics
ISO/IEC 7816-1 specifies the physical characteristics of a smart card [10]. The
most common
format is the ID-1 card, with dimensions of 85.60 X 54.00 X 0.76 mm. These are the
usual credit
card shaped cards used in such industries as financial, healthcare and others.
Figure 2.1 shows this
common format.
Another common format is the ID-000 card, known for the Subscriber Identity Module
(SIM) cards.
xviii
Figure 2.1: All three smart card formats. [11]
These cards have dimensions of 3 X 15 X 0.76 mm, used mainly on cell phones to
identify and
authenticate subscribers towards the cell phone operator. There is also a smaller
version, the ID-000
card named mini-UICC, the smallest card format produced. There also is a third
format, named ID-00
or ’mini-card’, that has a size between the ID-1 and ID-000. This format has not
yet been established
internationally.
2.1.2
Architecture
2.1.3
Standards
Several different worldwide companies produce the smart card and the infrastructure
to communicate with them. To guarantee the compatibility and interoperability
between different manufactur-
xix
ers, the International Organization for Standardization and International
Electrotechnical Commission (ISO/IEC) 7816 was created to specify smart card
standards.
Contact cards that are International Organization for Standardization (ISO)
compliant must follow
part one, two and three of the ISO 7816 [12], part one describes the physical
characteristics, part two
the location and dimensions of the contacts and part three describes the
transmission protocols and
electronic signals.
Contactless cards must comply with the additional ISO/IEC 14443, divided in four
parts where part
one describes the physical characteristics, part two the radio frequency power and
signal interface,
part three the initialization and anti-collision mechanism and part four the
transmission protocol.
2.1.4
2.1.5
Smart cards use what is called Application Protocol Data Units (APDU) to exchange
data between
a terminal. The communication is specified by ISO/IEC 7816-4 [12].
The two main components of a APDU are the header and body. The header has a fixed
size and is
always present, while the body can vary in size and is not required for some
instructions. The header
can be divided into four elements, the class byte (CLA), the instruction byte (INS)
and two parameter
bytes (P1, P2). As for the body, it has three components, the data length command
Lc , the data field
and the response length expected Le . The left part of Figure 2.4(a) shows a visual
representation of
the APDU structure.
xx
(a) Figure A
(b) Figure B
The response APDU is composed of a response body and trailer (SW1, SW2), with the
body being
optional and the trailer being of fixed size. The body contains the data produced
by the previous
command, which can be none if the previous operation did not return anything. The
SW1 and SW2
contain the response code indicating if the processing status was successful or
not. The right part of
Figure 2.4(b) shows a visual representation of the returned APDU format.
For example, a verify command, to check if the user’s inserted pin (0000) matches
the smart
card internal one, is represented as the APDU: 0x94 0x20 0x80 0x00 0x04 0x30 0x30
0x30 0x30,
where 0x94 identifies the instruction class, 0x20 the instruction, 0x80 and 0x00
are parameter 1 and
2 respectively, 0x04 is the data length the data and the four 0x30 are the PIN
digits in ASCII value. If
the pin was correct, the returned APDU would be 0x90 0x00, where the 0x90 and 0x00
are the trailer
SW1 and SW2 respectively.
2.1.6
2.1.7
Security Mechanisms
One of the main features that make smart cards so desirable, is their ability to
protect the data in it.
To achieve that, tamper mechanisms are implemented to stand against physical and
logical attacks.
Some of the countermeasures against physical attacks [13] are: The programmable
active shield, that
is a protective shield layer that covers the smart card microcontroller, preventing
the chip components
xxi
Figure 2.5: Smart card communication protocol structure.
from being analysed/probed; the Memory Management Unit (MMU) that acts like a
firewall, preventing
smart card application from accessing privileged resources that should only be
access by the OS;
The data bus encryption, that cipher data being passed over the bus, preventing an
attacker from
knowing what values are being transmitted; The sensors that are integrated with the
microcontroller to
prevent abnormal operation of the smart card, by monitoring things like the
internal and external clock
frequency, voltage, temperature and others components; The Cyclical Redundancy
Check (CRC) that
checks for data errors during transmission, reading or writing; Finally, the
current masking device is
a security mechanism against power analysis that operates by performing random
dummy accesses
operations in memory, changing the power consumption of a device during operation.
2.1.8
Operating system
Operations on smart cards are controlled and monitored by the COS which is a small
O.S, specific
for each type of card [10]. They are divided in two groups: The general purpose
COS, that has a
generic commands that work on many applications, like the Java card; The dedicated
COS, that has
instructions for some specific applications and can also contain the application
within itself. COS
can also be classified as an open or a proprietary platform. The term open platform
means third
parties can load programs into the smart card, while proprietary platform means the
opposite, only
the producer of the COS can install programs to the card.
Java Card is one of the most used COS worldwide. It is an open source, multi-
application operating system based on Java, intended to abstract the hardware from
the programming language, i.e.
programmers do not to worry about the hardware specifications when developing a
program. This is
accomplished by translating the programs to bytecode, with is then interpreted by a
virtual machine.
The virtual machine is responsible to run the program while handling the hardware
specifications and
resources. Java card allows a smart card to have multiple Java Card applications
(applets) inside,
while granting isolation from each other through the use of firewalls. This has the
advantage of allowing different vendors to have their applets on the same smart
card, independently of the level of
security and testing each applet has.
MULTOS is another popular smart card operating system known to be used when there
are needs
for high security and performance. Applications are typically written in C
language, which are then
compiled into the MULTOS Executable Language (MEL). Like the Java Card, it also
allows multiple
xxii
applications on a single smart card, while providing isolation from each other. The
main difference
comes in at production, were the MULTOS manufactures must comply with a licence
that obligates
them to perform rigorous security and interoperability tests on these smart cards.
2.2
Cryptographic Algorithms
2.2.1
AES
xxiii
subBytes, shiftRows and mixColumns, where each one either replaces the values of
the state with
new ones or changes their position. The result, after all the operations, is the
ciphered data. As for
the deciphering operation, is the same process with the same number of rounds and
operations but
in reverse order. Next, the previous 4 mentioned operations are described in more
detail:
AddRoundKey operation performs an Exclusive-or (XOR) between the state and a round
key. A
round key is generated from the secret key, via a key scheduler, and can be
represented also in a 4x4
matrix. Figure 2.6 illustrates the addRoundKey operation.
SubBytes operation performs a non-linear substitution on each byte, meaning, every
position
of the state is replaced by another value. The substitution function is named S-box
and works by
replacing the state bytes either, by using a formula, or by using a pre-computed
table with all the
possible values. Figure 2.7 illustrates the subBytes operation.
ShiftRows operation performs a number of row rotations, that depend on the row
position. By
shifting n − 1 times, where n is the row position. This means that the first row
stays the same, the
second is shifted one position, the third two positions and the fourth is shifted
three positions. Figure
2.8 illustrates the shiftRows operation.
MixColumns operation performs a column data mix independent of the other columns.
To achieve
that, each state column is multiplied with a fixed polynomial. Figure 2.9
illustrates the mixColumns
operation.
The only difference to this sequence is on the tenth round, where the mixColumns is
not performed
and a extra addRoundKey is computed. The pseudocode of this algorithm is shown
bellow:
a e s c i p h e r i n g ( b y t e p l a i n t e x t , word round key ) {
byte s t a t e ;
state = plaintext ;
AddRoundKey ( s t a t e , round key ) ;
f o r ( i = 1 ; i < num rounds − 1 ; i ++){
SubBytes ( s t a t e ) ;
ShiftRows ( s t a t e ) ;
MixColumns ( s t a t e ) ;
AddRoundKey ( s t a t e , round key [ i ] ) ;
}
SubBytes ( s t a t e ) ;
ShiftRows ( s t a t e ) ;
AddRoundKey ( s t a t e , round key [ num rounds ] ) ;
return s t a t e ;
}
2.2.2
RSA
(mod n) = C(ciphertext)
(2.1)
The private key is a pair Ppu = d, n were n is the value that is going to be used
to raise the
message value M and n is the value that will be used on the modular operation. The
decipher
operation is presented as the following:
Cd
(mod n) = M (M essage)
(2.2)
RSA and other asymmetric cryptographic algorithms are based on the modular
exponentiation.
The most used technique to perform a modular exponentiation is known as the square-
and-multiply
algorithm. The algorithm works by checking if the last bit of the exponent number
is one or zero. If it
is zero, the algorithm performs a square operation with the current result and the
exponent is shifted
one position to the right. If the last bit is one, the previous operations are
applied plus a multiplication
between the current result and the base. The following presents a possible
implementation of the
algorithm:
xxv
squareAndMultiply ( x , n ){
i f ( n < 0)
r e t u r n s q u a r e A n d M u l t i p l y ( 1 / x , −n ) ;
else i f ( n == 0 )
return 1;
else i f ( n == 1 )
return x ;
else i f ( isEven ( n ) )
return squareAndMultiply ( x ∗ x , n / 2 ) ;
else / / i s Odd
return x ∗ ( squareAndMultiply ( x ∗ x , ( n − 1 ) / 2 ) ) .
}
This implementation is vulnerable to SPA since the execution of the multiplying
operation is dependent of the secret key bits [3]. This issue will be addressed in
detail on section 2.3.2.
2.3
xxvi
2.3.1
Power Analysis
Most common modern digital circuits are build using complementary Metal-OxideSsemi
Conductor
(CMOS) cells. This technology has the particularity of having significant power
consumptions when
the internal logic cell values change. This type of power consumption is dynamic
and occurs when
logic cells change their internal values from 0 → 1 or 1 → 0. Logic cells that
maintain their internal
state have a static consumption, leading to very little power consumptions. This
happens because
when there is a state transition it is usually required to charge or discharge a
condenser, used to
maintain the logical state.
The state of logic cells depends on the input and operations a device is
performing. An attacker can
acquire knowledge about operations and data being processed at a given moment. This
information
leak can ultimately lead to the discovery of sensitive data, such as secret keys on
a cryptographic
device. This type of analysis is called PAA where a correlation between the power
consumption and
the key dependent operations performed on the device is evaluated.
An attacker has, in most cases, limited to no knowledge about a device
implementation. So to
simulate the power consumption of a device, they make use of simple power models.
The most
commonly used are the Hamming-Distance (HD) and Hamming-Weight (HW) [5].
The Hamming-Weight power model is a simple model that considers the number of bits
set to one,
at a given moment, to describe the power consumption of a device. This model
requires almost no
knowledge about the device structure, and only needs to know the value being
processed at a given
time.
The Hamming-Distance counts how many bits have changed from 1 → 0 and 0 → 1 over a
transition. This model requires the attacker to know the predecessor and successor
values at a given time.
It also requires some knowledge about the device implementation which in most cases
is not known.
These power models are then used on PAAs. PAAs are divided into three types: SPA,
DPA and
High-Order DPA. Later sections are going to be addressed the first two attacks.
The following sections will refer only to the software implemented algorithms on a
8-bit smart card,
since they are simpler to understand. For a power analysis attack on an AES
Application Specific
Integrated Circuits (ASIC) implementation refer to [19].
2.3.2
xxvii
i.e. the algorithm implementation is translated into a set of instructions that run
in sequence on the
device.
The instruction set of a device can be divided into four subsets: the arithmetic,
logical, data transfer and branching sets. These sets work with different
components as the arithmetic-logic unit, the
co-processor, the RAM, the ROM and the peripheral components. Components have their
unique implementation and purpose, making them have different power consumptions
patterns when running.
This means the different components can be observed in the power traces allowing to
identify what
instructions are running at a given time.
Instructions have their own power characteristic, leading to potential severe risk
if their execution,
depends on the secret key. For a device running an algorithm with such
characteristics, the key can be
compromised without much effort. The best example can be seen in some public-key
cryptosystems
implementations using the square-and-multiply modular exponentiation [3]. The
squaring operation
is performed for every bit of the secret key while the multiply operation is
performed only for bits of
the key equal to one. Since the square and multiplication can be distinguished in
the power trace, retrieving the secret key is as simple as checking when only a
squaring or the squaring and multiply are
performed. Figure 2.10 depicts an example of a power trace during a square and
multiply operation.
Even if the sequence of operations does not depend on the secret key, visual
inspection can still
be useful in cases where the algorithm running on a device is not known. It can
also give some hints
about how an algorithm is implemented.
Figure 2.10: An illustrative example of how Simple Power Analysis can be used to
identify the bits being processed by visually inspecting the instantaneous power
consumption.
2.3.3
In power analysis attacks, DPA are more effective than SPA in most cases. One of
the reasons
being, they do not require detailed knowledge about the attacked device, being
sufficient in most
cases to know the algorithm running on the device [5].
DPA requires many traces in order to be successful, but it’s effectiveness
recovering the secret
key is superior to SPA, being able to recover keys even if power traces are noisy.
The attack looks for
data dependencies at specific points of the power trace, instead of patterns in a
complete trace.
This type of analysis is performed in five steps [5] detailed follow:
Step 1: Choosing an Intermediate Result of the Executed Algorithm.
intermediate stage of the algorithm running on the attacked device is chosen. This
stage must
xxviii
be a function that depends on a known value, the algorithm’s input or output data,
and a portion
of the secret key.
used in the intermediary operation are generated and stored in a vector k. The
elements of
vector k are typically called key hypotheses. Then we build a matrix V containing
all the possible
intermediary values using vector k and the data vector d. Each line of the matrix
will have all
the possible intermediary values for one input data.
Since every column value depends on one hypothetical key and the input data, one of
columns
will have the same values as the attacked device. Figure 2.12 illustrates the
process.
Step 5: Comparing the Hypothetical Power Consumption Values with Power Traces.
In
the final step, each column of hypothetical power consumption matrix H is compared
with all
the columns of matrix T. These operation compares all hypothetical power
consumption of all
key hypothesis with all the position of the recorded traces. All results are then
stored in matrix
R. These comparisons between matrices are performed using statistical analysis,
such as the
correlation coefficient or distance of means methods [3]. Figure 2.14 illustrates
the process.
In the end, if the power model and the number of used traces used are adequate, the
key is found
by searching the row, on the resulting matrix R, that as the highest value.
It is worth mentioning that when in step 4, the correlation coefficient is used,
this attack is named
Correlation Power Analysis.
xxix
Figure 2.11: Illustration of the power consumption measurement (step 2).
2.4
Signal Characteristics
This section will talk about the conversion between an analog signal to digital,
explaining how
the conversion is done and how to minimize the loss of information during the
process. Then the
components that constitute each sample of a power trace, are presented.
2.4.1
Analog to Digital
The power consumption of a device can be seen has as an analog signal, i.e.
continuous on
the time domain. To be processed and analysed by a digital device, like a computer,
the analog
signal must be converted from continuous to discrete. This conversion is done using
sampling and
quantization. Sampling converts the time axis to a finite number of sampled values.
Quantization
xxx
is performed on the y axis, where the amplitude of a signal is mapped to a finite
set of values, as
illustrated in Figure 2.15.
2.4.2
Power traces
The total power consumption in each point is the sum of four components. The
operation-dependent
component is represented by Pop . Data-dependent component is represented by
Pdata . The other
two, electronic noise and constant power consumption, are represented by Pel.noise
and Pconst respectively.
Electronic noise happens when a undesired signal or signals exist and are recorded.
This is something that is present in every measurement in practice. The way to
characterize it, is by performing the
same operations with constant input several times and average the result. This way,
since the noise
is something more or less random, it will be partially averaged out.
xxxi
Constant power consumption is usually ignored, since it does not carry any useful
information
from it. This is because the presence of Pconst in measurement is normally caused
by current leaks.
Internal value changes on transistors that are independent of the data processed
and operations
being performed are called switching noise. The total power on each sample of a
recorded power
trace, can be reformulated by:
Ptotal = Pop + Pdata + Pel.noise + Pconst
(2.3)
If an attacker wants to retrieve useful information from the power traces, it can
use different types
of power analysis on Pop , Pdata . This type of analysis targets different
properties of those components,
meaning that, he can target a complete or a small part of a component. The part
used in the attack is
called exploitable component, denoted by Pexp . The part not used is called
switching noise, denoted
by Psw.noise . The relation between these components can be defined as:
Pop + Pdata = Pexp + Psw.noise
(2.4)
Considering this new relation, the total power composition can be formulated as:
Ptotal = Pexp + Psw.noise + Pel.noise + Pconst
(2.5)
The analysis of the signal containing useful information Pexp , becomes more
difficult to perform
the higher the value of Psw.noise and Pel.noise . One way to quantify the leakage
of information, from
one point, is to use a signal-to-noise ratio (SNR) metric. In the context of power
analysis, SNR can
be written as follows:
SN R =
V ar(Pexp )
V ar(Psw.noise + Pel.noise )
(2.6)
2.5
Welch’s T-Test
Properly acquired power traces are essentially a signal that need to be processed,
in order to
retrieve useful information and ultimately discover the secret key of an electronic
device. Until now,
the CPA analysis was used to retrieve this secret key, but this method as one
disadvantage: it is
computationally expensive to correlate thousands of large traces, when in the end
usually only a
portion of the trace is relevant to the recovery of secret keys. This section
presents a statistical
tool named t-test, to try to find this relevant point in a power trace, without
heavy computation or,
depending on the attack, thousands of traces.
T-test is a statistical hypothesis test used to distinguish if two sets of data are
significantly different
from one another. This test follows a Student’s t-distribution, used when the data
being analysed are
normally distributed, the number of samples is small and the standard deviation is
unknown. Welch’s
t-test is an adaptation of the regular t-test for sets that have unequal variances
and sample sizes.
xxxii
This method by itself does not reveal the secret key of a device, but can show
potential PointsOf-Interest (POI) by distinguishing trace samples if they have
different power consumptions, revealing
possible differences in the signal, thus revealing possible leakages. This is of
special importance
when dealing with a large number of traces with lots of samples, since it would
reduce the correlation
to those presenting potential interest. The t-test formula is defined as the
following:
X1 − X2
t= q 2
s1
s22
N1 + N2
(2.7)
where X 1 , X 2 are the sampled mean from dataset 1 and 2 respectively, s21 , s22
are the sampled
variance and N1 , N2 are the number of samples.
The degrees of freedom v associated with the variance can be calculated using the
following
formula:
s2
v≈
( N11 +
s2
( N1 )2
1
N1 −1
s22 2
N2 )
s2
(2.8)
( N2 )2
2
N2 −1
This method works in the power analysis context, because different input data cause
different intermediary values. These intermediate values cause a distinct power
consumption that is noticeable,
whereas instructions that are independent of the input data leakages have the same
power consumption. The points, where the consumption is different, are potential
places that can have sensitive data
like the XOR operation between secret key and plain text. But for this to work, two
distinct power trace
groups need to be created by using two pre-selected plaint text groups. The
categories are presented
below:
• Fixed vs. Fixed - The two sets of input data have only one data value different
from each other.
This method has the advantage of being completely independent of the algorithm
being used,
but usually at the cost of increasing the false negatives.
• Fixed vs. Random - One set has one fixed value and the other has random values.
This one
is very simple and has the advantage of being completely independent of the
algorithm being
analysed.
• Semi-fixed vs. Random - One set has data that produce some fixed intermediary
values and the
other set only has random data. Taking the AES as an example, the semi-fixed data
set can be
one that sets half of the bytes, after the first round, to zero and the other half
to random values.
This method required some knowledge of the algorithm being analyzed.
The advantages of using this method, when comparing with the correlation attack, is
that it requires
much less traces to give meaningful results. The main disadvantage is the loss of
intuition on where
the main leakages points are located. This happens because t-test tends to give a
range of points
around the point of interest while CPA is more precise. There are also possible
false negatives, where
t-test reports does not report POIs in places that have sensitive information being
handled.
Some techniques can be used to improve the results like the ”Paired t-test” [20]
where the values
from the two sets are given alternated to the crypto device, so its internal state
changes between
ciphering operations.
xxxiii
xxxiv
3
State of Art
Contents
3.1
3.2
3.3
3.4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xxxvi
xxxix
xlii
xlii
xxxv
This chapter introduces some of the signal processing methods and procedures that
can be used
to obtain better traces and discusses some platforms currently being used by the
industry to assess
the security of an electronic devices. This work will cover the smart card Section
3.1 presents some of
the side-channel analysis platforms. Section 3.2 presents 2 state of the art
attacks that can increase
the CPA attack effectiveness. Section 3.3 gives an overview of 2 defence mechanisms
that allows to
reduce possible signal leakage. Finally section 3.4 will give a conclusion of this
chapter.
3.1
SCA platforms, in the context of this work, refers to software and hardware used to
assess the
security of cryptographic devices against power analysis attacks. There are two
platform categories:
• Ad-hoc/home-made that can consist of an oscilloscope to measure the power
consumption, a
resistor in series with the device power supply and some scripts to analyse the
obtained data.
• Commercial tools and software developed by companies or the academic community,
with the
purpose of testing the security of an electronic devices.
The following presents some of the most used platforms, both in software and
hardware, with
a brief introduction describing their functionality as well some of its advantages
and disadvantages.
For hardware, the SAKURA-G/W and ChipWhisperer for in software the ChipWhisperer’s
software
module.
3.1.1
SAKURA-G/W
SAKURA-G, (Figure 3.1) and SAKURA-W (Figure 3.2) are both testing platforms to
evaluate
the leakage and related security of the implemented cryptographic modules. The are
designed for
xxxvi
Figure 3.2: Top view of the SAKURA-W.
research and evaluate the side-channel leakage, such as SCA and Fault Injection
Attacks (FIA).
SAKURA-W is designed to serve as an adapter, that sits on top of the SAKURA-G to
enable smart
card security tests. SAKURA stands for Side-channel Attack User Reference
Architecture.
This platform is well known in the SCA community, being the one chosen for the DPA
contest [22],
a website that hosts a competition where people submit their power analysis
algorithms, and the best
one is chosen.
SAKURA-G is a 140mm x 120mm board, composed of two programmable FPGAs: A Xilinx
Spartan-6 (C6SLX75) for the cryptographic circuit (the main FPGA) and a Spartan-6
(XC6SLX9) for
the control circuit (controller FPGA). The main clock oscillator for these FPGAs is
clocked at 48Mhz,
but this value can be scaled up or down when programming the FPGAs.
The board is designed to be low-noise, in terms of power analysis, and comes with
an on-board
amplifier to facilitate power analysis. The amplifier has a 360MHz bandwidth with a
+20dB gain. It
has power sources, one via USB and the other from an external power supply. Also,
it comes with two
sets of 40 user I/O pins, where one set is controlled by the main FPGA and the
other by the controller
FPGA.
SAKURA-W is an expansion board; it uses the SAKURA-G’s FPGA controller to deliver
the command on the smart card. It has a smart card reader, no amplifier and only
has a set of 40 I/O pins.
The main disadvantage of this platform is its high cost, since it can be a barrier
for those wanting
to start study the field of power analysis.
3.1.2
ChipWhisperer
The project provides a board, to allow a user to configure it with the desired
algorithm, plug it
into on the computer and start experimenting on the field of side-channel attacks
using the provided
software (depicted Figure 3.4 ), while maintaining a low cost when compared to
other commercial
products.
One of the first products, still in production is the ChipWhisperer-lite (depicted
Figure 3.3) that is
formed by the measuring board and a target board. Both boards come connected with
each other,
needing to be break a way from each other and the connectors needed to be soldered.
There is also
a more expensive version that comes with the components already soldered.
As for the board components, the measuring board has a 10-bit Analog-to-digital
converter, one
Atmel SAM3U High-Speed USB, Xilinks S6LX9 FPGA, a +55 low noise amplifier, one
MOSFET for
glitch generation and the board is powered by a micro-USB, which also serves as the
communicator. The target board comes with an 8/16-bit XMEGA128 microcontroller that
guarantees real-time
performance, i.e. that the system responds within a specific time constrains, and
is of low power
consumption. At the time, the price of the ChipWhisperer-Lite was $250 USD and
other options, like
the board already with the connectors or a more robust version is also available to
buy on the website
xxxviii
[24].
The product is designed to be low cost, while performing at the same level of other
more expensive
products, by providing a synchronous capture. The concept of synchronous capture is
to sync the
trace collection with the target device clock and multiply or divide the base
frequency. This helps to
maintain the traces aligned, while allowing the device to have cheaper parts when
compared to high
end oscilloscopes.
The software is written in python 2.7, designed to be cross-platform, coming in two
modules: Capture module, used to configure the oscilloscope, communicate with the
device and gather the power
traces; and the analyser module, used to visualize and process the captured data.
This software is
also compatible with regular oscilloscopes as PicoScope .
3.2
Attacks
The following presents 2 attacks that can be used to improve the CPA analysis.
First the template
attacks are going to be presented, where the device consumption is characterized to
reduce the
traces required to discover the correct key. Then collision attacks are presented
to reduce the number
of keys guesses by finding equal intermediary values in different plain texts.
3.2.1
Template Attacks
Template attacks [5, 25] are based on the characterization of power traces using
multivariate
normal distributions. This technique depends on the data being processed and can be
divided in two
parts: the template building and the template matching.
The main idea of this attack is to record the power trace of a set of sequenced
instructions, on
a device equal to the one under attack [5], using different data and secret keys.
Then the power
consumption trace is recorded and the multivariate normal distribution is computed.
The result of this
will be a template for each pair of data and the key used on the profiling device.
The final step is the
recording of the power trace on the device under attack and compare it with the
template built. The
result of this operation is a probability, measuring how good a given template
suites the power trace.
Template Building Phase
In the template building phase, a single instruction or a set of sequenced
instructions are characterized. The targeted instructions must manipulate the
secret value while the power consumption is
recorded. For the instruction characterization to work, different pairs of data and
keys (di, kj) must be
used. Then, from the traces gathered, a multivariate normal distribution composed
by a mean vector
and a covariance matrix is computed. There are three typically strategies used to
build templates: the
usage of pairs of data and keys, where multiple data and keys are used to build a
template. the usage
of intermediary values, where a function that uses (di,dk) is characterized for all
its possible values;
the usage of power models like the Hamming-weight or Hamming-distance.
xxxix
Template Matching Phase
In the template matching phase, the probability density function and the power
trace are evaluated
using the Gaussian/Normal distribution seen in the following formula:
(3.1)
were m is the mean vector, C is the covariance matrix, t is the power trace and
(di, kj) are the data and
key used to build template. From the formula, a probability is obtained indicating
how well the power
trace suits the template, outputting the highest value for the best match. During
the computation,
numerical problems can arise while computing the covariance matrix inversion and
exponentiation.
Performing the exponentiation on small numbers can lead to numerical problems. One
solution is
to perform the logarithm to the equation. This has the consequence that now the
smallest absolute
value of the logarithm indicates the correct key and not the highest one. To avoid
the inversion of
matrix C, it is set to the identity matrix. Doing that, discards the covariance
between points considering
only the mean vector. Finally, the resulting equation still has one exponentiation
and to remove it,
the logarithm is again applied, avoiding possible numerical problems when using
small exponential
values. The final expression is as following:
1
ln p(t; (m, C)di,kj ) = − (ln(2 · π)NIP + (t − m)0 · (t − m))
2
(3.2)
With these additional operations, the template that produces the smallest absolute
value is the one
that indicates the correct key.
Example using the Hamming-weight
An example of use of Hamming-Weight can be given for the MOV instruction, that
moves part of
the secret key to a register. An attacker can characterize the Hamming-weight of a
device under his
control, by checking the power consumption for each value (from 0 to 8 on an 8-bit
microcontroller).
If each of the different weights has distinct power consumptions, a template can be
built for each
Hamming-weight value. Then he gathers the power trace on the device under attack,
at the approximate moment were the MOV operation is being performed and compare it
with the expected values.
This will give the Hamming-weight of the key portion being processed by the MOV
instruction. If this
is repeated for every key portion, it reduces the number of guesses an attacker as
to make to retrieve
the secret key.
Template-Based Attack for DPA
Template-based attacks improve regular DPA attacks by reducing the probability of
wrong key
guesses. They can be seen as an extension of the template attacks for SPA where the
device consumption is characterized.
The basis of this attack starts by, given a trace ti what is the probability of
finding key kj , written
as p(kj |ti ). Consider ti as an element of power trace vector t and kj as an
element from the possible
xl
key guesses vector k. Using the Bayes’ theorem [3] the following formula is
deducted:
p(ti |kj ) · p(kj )
p(kj |ti ) = Pn
l=1 (p(ti |kl ) · p(kl ))
(3.3)
Bayes’ theorem can be seen has a update function, that receives as an input the key
probabilities
p(kl ) that do not consider ti , but its output value consider it. The input key
probabilities p(kj ) are
known as prior probabilities and the output p(kj |ti ) is known as posterior
probabilities.
The previous formula works for one trace, but in practice, multiple traces are used
to gather more
information about the secret key. Now using matrix T, were each line can be seen as
a power trace
vector, the mathematical condition is written as p(kj |T ). Since every trace is
statiscally independent,
applying the Bayes’ theorem leads to the following formula:
Qm
( i=1 p(ti |kj )) · p(kj )
P
Qm
p(kj |ti ) = n
l=1 (( i=1 p(ti |kl ) · p(kl ))
(3.4)
Finally, the probabilities of p(kj ) and p(ti |kj ) need to be determine in order
to calculate p(kj |ti ).
The value of p(kj ), since all key values are equally likely, is 1/K, were K is the
number of possible
keys. As for the value of p(ti |kj ), they are calculated after phase 3 of the DPA
attack, were the
probabilities are based on the power trace matrix M and the all possible
intermediaries values matrix
V.
3.2.2
Collisions Attacks
During encryptions using different plaint texts and an unknown key, it is possible
that intermediary
functions produce the same values. When this event occurs, it is said that a
collision happened. The
importance of a collision is that, when it can only occur for a certain subset of
the all possible key
values. For example: considering an intermediary function f (di , kj ) where di ,
kj are the input data
and the unknown key respectively, if f (d1 , k1 ) = f (d2 , k1 ) and d1 6= d2 , k1
can only assume a reduced
number of values so the two functions have the same value. This reduces the number
of key guesses
to find the correct key. This type of attacks uses side channel analysis to detect
the internal collisions,
in this case using power analysis.
On AES, a collision attack can be performed on the Mix Column Transformation, for
example, the
mix columns operation can be presented has a matrix multiplication, like the
following:
x0
02 03 01 01
a0
x1 01 02 03 01 a1
=
x2 01 01 02 03 × a2
x3
03 01 01 02
a3
(3.5)
(3.6)
For the attack to work, one must consider only plain text values that would make
booth d0 = d1 = 0
and d2 = d3 . When two plaint texts producing d2 6= d02 and d3 6= d03 produce the
same output,
information can be deducted from k2 and k3 .
xli
3.3
Defence Mechanism
Until now, attacks against cryptographic devices were covered, where the power
consumption of
a device is correlated with the values it processes. To reduce this correlation
between these values
and the power consumption there are two main methods used, namely hiding and
masking.
Hiding tries to make the power consumption independent from the operation being
performed and
the intermediary values being processed, either in software or hardware. On the
software level, to
make the power consumption appear randomly, instruction delays, dummy operations,
and instruction
shuffling, inserted in the program. These operations are controlled by random
values that generates
values that are then used to decide how long the delays are, how many dummy
operations are performed and where the instructions are going to be shuffled. This
method has the disadvantage of
increasing the power consumption and the processing time.
On the hardware level, the device can be built to consume an equal amount of power
for every
operation and data processed. One way to do this is by using dual rail logic, logic
cells receive and
output a value and their complement. This with a precharged logic, which puts the
output of a logic
gate on a specific value of either 1 or 0, which always produces the same sequence
of bit transitions.
This method has the disadvantage of, besides increasing the manufacturing cost, it
is not possible to
make a device’s power consumption 100% independent of each operation and data
processed.
Masking has the same objective as hiding, making the device consumption independent
of its
intermediary values, but uses random values to mask the intermediary values,
instead of trying to
change the device’s power consumption. This requires only changes on the algorithm
so it applies
and takes the random mask from its intermediary values. As an example, consider v
to be the intermediary value and m a random value generated internally by the
algorithm. The masked value is
the result of XORiing the two values vm = v × m. This type of defence mechanism
works because
the consumption of a device depends on the values being processed. If something
random is added
before the intermediary is processed, the consumption of the masked value will
appear to be random,
making the consumption independent from the processed value.
3.4
Conclusion
This chapter presents some of the state of art SCA platforms, attack methods and
defence mechanisms. This chapter starts by presenting 2 types of platforms for
side-channel analysis, were one is
an ad-hoc/home-made platform, whereas the other is a CPA commercial product. Next,
some more
advance attack methods are presented, starting with template attacks, that rely on
characterising
the device’s power consumption, and collision attacks that try to identify when
collision happens to
reduce the number of key guesses. Finally, hiding and masking defence mechanism are
presented
to increase the CPA resistance, where hiding tries to directly change the power
consumption of the
device, and masking tries to conceal the intermediary value before processing then.
For more information on this topic refer to [26].
xlii
4
Proposed Solution and
Implementation
Contents
4.1
4.2
4.3
4.4
4.5
Processing Units . . . .
Trace Acquisition . . . .
Signal Analysis . . . . .
The overall setup usage
Conclusion . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. xlv
. xlix
. liii
. liv
. lvi
xliii
This chapter presents both the proposed setup, describing the components and the
analysis
scripts developed.
This work intends to provide a setup that allows the assessment of an electronic
device security,
using power analysis. The setup components can be divided in 3 main parts: the
device under test,
the trace acquisition and the trace analysis. The device under test is the
equipment from where the
power traces are retrieved while performing cryptographic operations. In this work
the devices used
were the smart card and FPGA, both configured with an AES algorithm.
The component that gathers traces is the oscilloscope, measuring the power
consumption of the
device being tested, and a python program running on a PC that configures and
coordinates the trace
collection both on the oscilloscope and on the device under tests. The component
that analyses the
traces is composed of the analysis scripts that are execute on a PC, they perform
various statistical
analyses on the power traces to not only retrieve the secret key, but also assess
the signal quality or
find potential points that can be exploited. Figure 4.1 illustrates the components
and their relationship.
It is important to understand how the components work together, so the user can
make the most
of the setup. The following sections provide a deeper explanation of the setup.
This chapter presents
some of the challenges encountered while configuring and programming the setup and
the decisions
made to overcome them. If a user wishes to use/understand the setup, make
improvement to it or is
developing its own setup, the following sections can help save time since many of
the discovering and
problem solving was done while developing this one.
This chapter is divided into 4 sections. Section 4.1 presents the work developed in
regard to the
target devices, from their characteristics to the software developed. Section 4.2
presents the work
developed on the trace collection device, it presents the used oscilloscope
characteristics and the
development of the trace collection python program and communication interface with
the targeted
devices. Section 4.3 presents the scripts developed for trace analysis, explaining
their purpose and
structure. Section 4.4 presents an overview on how the setup can be assembled and
configured both
xliv
for SAKURA-G and SAKURA-W. Section 4.5 presents some concluding remarks of this
chapter.
4.1
Processing Units
This section presents and explains all the work related to the two target devices
used, the FPGA
and smart card. Although both were physical devices, this section is divided
between software and
hardware, because some of the nature work was more software programming and the
other more
hardware setup testing, in the case of the smart card and the FPGA respectively.
It is worth mentioning that the smart card programming has a larger section when
compared with
SAKURA-G, since the SAKURA-G already provided the source code and did not need any
changes
in its behaviour. On the other hand, the provided smart card software, required
licensing. Meaning
that the source code was not publicly available, which prevented the software
behaviour from being
changed. To solve this restriction, the smart card software was developed from
scratch. Next the
work developed and the decisions made on both devices will be presented.
4.1.1
SAKURA-G/W
xlv
the Verilog-HDL.
In terms of triggering, in the SAKURA-G is done by sending the signal through one
of the top
external pins. The first 4 pins are set at the start of the key scheduling, first
AES round, last round
and for all rounds. On the SAKURA-W, the 8th pin of the second row of the bottom
pins, is mapped
to smart card’s pin AUX1. This pin signals the trigger and the configuration of
when it goes ON is
defines is controlled by the smart card software.
4.1.2
Smart Card
A smart card can be seen as a miniaturized computer, having a CPU, RAM, ROM and a
storage
memory. The smart cards used in this project were acquired from a company named wb
electronics
[27], the same entity that also supplied the smart cards that came with the SAKURA
board. This was
important, in order to ensure compatibility between the smart cards that came with
the boards and
the ones acquired.
These smart cards use an 8-bit atmega8515 microcontroller and have 512 bytes of
Static Random
Access Memory) (SRAM) and EEPROM, 64K bytes external memory. They are able to
perform up to
16 Millions of Instructions Per Second (MIPS) at 16MHz, operating at 4.5V - 5.5V
and have a 10,000
write/erase cycles. To understand how the software components and the challenges
encountered
while developing the setup, the next section details the encountered the issues.
Source code development
The smart cards that came with the SAKURA board only provided the compiled code and
required
a licence for the source code. This prevented modifications to its behaviour and
implementation,
like trigger positioning/duration, changes to the AES algorithm, introduction of
delays, modifying the
communication protocol and other changes. Because of this, the decision was to
implement the smart
card software from scratch, to have more control and flexibility of what could be
done with it.
Before starting to implement, the wb electronics website named infinity USB, was
checked for
software examples of the smart card. On the website, source code examples for the
PC side were
available, but for the smart card only the compiled code was provided. We reached
out the company
and asked if it could provide the source code of the smart card, since was for
academic use and after
some time they agreed and sent the source code. The source code was of great help
since it provided
the program structure and communication functions to transfer data from/to the
smart card.
While waiting for the answer, the smart card’s atmega8515 microcontroller datasheet
[28] was
studied to understand how the microcontroller ports were organized, labelled and
what ports could
be mapped to the smart card pins. On the software part, the 8515 I/O Application
programming
interface (API) was studied to understand what software ports were available and
how they could be
used. This later helped to better understand the source code and its functionality.
After having the source code example, the next step was to program the new smart
card software,
using the functions to transfer and receive bytes of data from the provided source
code. In the end,
the goal was to have smart cards that were identical to the smart cards provided
with SAKURA smart
xlvi
cards, in terms of functionality, to ensure compatibility with the SAKURA provided
software, but able
of being changed.
Programming
The source of the smart card program needed to be tested to check if it was working
properly,
since sometimes the code provided is not the final one and has bugs. The source
code was compiled
using atmel studio 7.0 and the program was written on the smart card using the
smart card writer
named infinityUSB. After that, the smart card was tested using the PC side software
provided on the
infinityUSB website and after confirming everything worked, the smart card
programming started.
The program was adapted to have the same smart card command structure as the SAKURA
smart
cards (the APDUs), to ensure compatibility with the software provided by the
SAKURA’s website. After
this, a C implementation of the AES algorithm named tiny-AES128-C [29] was used
because of its
small size and simplicity, since the smart cards have limited resources. After
having the program
finished, the code did not fit on the internal memory of the smart card. The
problem was that, on
microcontrollers, every variable and vector is stored on RAM, and 512 bytes was not
enough to store
everything. The solution was to use the pragma PROGMEM, to store the constant
variables and
constant vector on the flash memory. The only difference was that now to access the
values stored in
these variables and vectors the function pgm read byte near and pgm read byte need
to be called.
After loading the compiled program into the smart card, the software was checked to
assess if
everything was working properly. To do this, the smart card writer was set to
reader mode and the
smart card was tested. After cheking the developed smart card software was working
with a modified
program from the infinityUSB website, the smart card was tested on the SAKURA-W.
On SAKURA-W a modified program, provided by the SAKURA’s website, was used. This
functionality of this program was to check if the SAKURA-W smart cards was working
properly, and the
adaptation was done to strip the program of its interface and leave only the
communication part.
However, when tested with the modified smart card it did not answer to the commands
that were sent.
The problem was found to be the frequency used by the smart card reader, which was
lower than the
SAKURA-W’s reader.
Frequency Mismatch
After a careful inspection on the data transmission’s functions and values being
used, it was concluded that the smart card software was configured to work at 6MHz.
SAKURA-W’s smart card reader
was clocked at 3.571MHz, but at first this should not make any difference since the
microcontroller
could work at lower frequency as well. The problem was on the receiving and sending
data functions,
that were dependent on the CPU frequency. To explain the issue a brief explanation
of the functions
will be given below:
Data on the smart card is transmitted in serial mode i.e. one bit at the time. The
smart card
reads/writes from/to the data pin, where the values can be either low or high (0 or
1) and there is
a time window for both reading and writing each state. In order for the smart card
to know when
xlvii
to read, a delay function is used, where it receives a value, decrements it until
reaching zero and
leaves the function. The idea is, the function will spend a certain amount of time
decrementing
the value, and this duration depends on that value passed to this function and the
CPU frequency.
Higher CPU frequency’s mean faster instructions processing and smaller delay times
using the same
function value. The program came with a predefined value for the 6MHz CPU and for
the bit rate of
9600 Bits Per Second (BPS). Note that the bit rate must also be considered, because
the higher the
bits transmitted per second, the smaller the bit intervals will be. To calculate
the new value for the
SAKURA-W frequency, the following formula was inferred from the values already
calculated in the
code:
delay value =
cpu f requency
3 × bps
(4.1)
The CPU frequency divided by three times the bps, gives the number of decrement
loops the function
should perform. The multiplication by 3 is because the function delay spends 3 CPU
instructions or
clock cycles to perform the value decrement.
After the new delay values, have been calculated, the smart card was tested again
on the SAKURAW and this time it responded to the commands but it did not deliver
the correct result, having extra
value in between the smart card answer. To try and solve the problem, the
communication was inspected at the bit level using the oscilloscope, as described
next.
Communication Debugging
To confirm if the new delay values were compatible with SAKURA-W frequency’s, and
to check if
the extra value of the result could be observed on the communication bits, the
communication of the
SAKURA smart cards and the programmed one was compared. In order to do that, the
developed
smart card software was programmed to have the same command structure and
received/sent the
same data as the card that came with the SAKURA board. This comparison was done at
the bit level,
comparing the time between each bit transmission, using the PicoScope and its
capturing software.
The transmission protocol is constituted by one delay bit that start at a low
value, eight bits of data,
one parity bit and two stop bits that stay at a high value. After comparing the
two, it was observed
that they had a small difference, but not enough to cause problems in the
transmission, except in one
case. When a call to the function receive and then sent a byte, the protocol was
not waiting the time
specified by the protocol on the stop bits.
After fixing the problem and checking the extra value was not present on the low-
level communication, the smart card was tested again to confirm the extra value was
still present on the result. The
conclusion was, the error did not come from data transmission, i.e. the problem was
not on the smart
card, and the extra value was being inserted by the PC side software. To confirm
this hypothesis,
the PC side code was written from scratch using also python to check if the problem
persisted. After
having the python program finished, the extra value was gone and the only thing
left to do was the
trigger. In a later section, the development of python program will be described in
more detail.
xlviii
Trigger Implementation
With the smart card programmed and communicating with the main python program, the
only
thing left to do was to programming a trigger signalling to identify when the AES
operation started.
This ensures that the acquired traces are aligned, resulting in a more effective
CPA analysis. Also,
the trigger would allow to measure how long an AES round takes since it is
difficult to identify the
beginning and ending of each round from the power traces. This measured time can
then be used to
configure the PicoScope capturing period.
The trigger was implemented by setting one of the auxiliary pins of the card to
serve as a trigger.
First the desired port was identified with the help of the atmega8515 datasheet,
then the pin was set
to output by changing the 5th register from Data Direction Register (DDR) B to 1.
Set the pin to high
or low, was done by setting the 5th bit of the register of PORTB to 1 or 0
respectively.
Having the trigger implemented and set to go ON during the first AES round, a
successful correlation analysis was possible in order to confirm it was working
properly.
4.2
Trace Acquisition
This section presents and explains the components used for the trace acquisition
and the challenges encountered during its development. The components that are
going to be addressed are the
PicoScope, the oscilloscope used to measure the power consumption, and the main
collecting software that is used, not only to configure and gather the power
traces, but also to send and receive data
to the device being measured. It is worth mentioning that the main gathering
program is essentially
the same for the smart card and on the FPGA, with the two differences: i) the
parameters for trace
acquisition, so it adjusts to the signal being measured; ii) the interface used to
communicate with the
target device. Next, a brief description of the PicoScope is presented.
4.2.1
PicoScope
The PicoScope series 6000 [30] (figure 4.2), in particular the 6404D, is a high-
performance USB
oscilloscope that, together with its software, turns a computer into an
oscilloscope and spectrum
analyser. This type of oscilloscope offers portability, performance, flexibility
and is programmable. It
is configured and accessed via PC using the software that comes along with the
device, or by making
use of the PicoScope device driver.
The oscilloscope comes with 4 input channels, 8-bit signal resolution, 500 MHz
bandwidth and up
to 5 gigasamples per second (GS/s) in real-time, shared among the 4 channels. It
comes with what is
called a deep memory that can store acquires up to 2 gigasample. To handle that
amount of memory
and at the same time display the traces without compromising performance, PicoScope
comes with
a Hardware Acceleration Engine (HAL) 4 that guarantees both trace gathering and
trace visualization
without slowing down. The allowed voltage range scales are: ±50 mV, ±100 mV, ±200
mV, ±500 mV,
±1 V, ±2 V, ±5 V, ±10 V, ±20 V. It has an integrated wave generator output, capable
of generating
xlix
Figure 4.2: PicoScope 6000 series.
4.2.2
The main trace gathering program was based on an open source program made by Colin
O’Flynn,
denoted as pico-python [31]. This software works as a wrapper for the PicoScope ’s
API, providing
more improved functions to access PicoScope’s functionalities. The program is
divided in 3 parts:
PicoScope configuration, target device instrumentation and storage.
On the PicoScope configuration, first the communication driver is located and
loaded. Then the
parameters of the PicoScope , such as the trace length, signal amplitude and
sampling frequency are
l
sent to the PicoScope . Also, the input channels that are not going to be used are
turned off (by omission PicoScope activates all channels) and the trigger is
configured. On the device instrumentation,
the plain texts can be loaded from a file or be generated randomly by the program.
Then the trigger
is armed and the program sends the plain text to the device, using an interface
that is different for
SAKURA-G and SAKURA-W. After the trace is gathered, the program receives the cipher
text.
On the trace storage, the gathered trace is converted from a raw type to a 32-float
type, and then
is stored in a Matlab file format.
It is worth mentioning that because python 32-bit is used, there is a limitation on
the amount of
memory that can be allocated, meaning the program is not capable of supporting
large traces in
memory. To overcome this problem, the program gathers a specified amount of traces,
stores them
on a file, clears the memory and continue to gather new traces.
4.2.3
The interface to communicate with the SAKURA-G, used by the trace gathering
program, was
based on the program source code from the SAKURA website [32]. This program, named
SAKURAG checker, communicates with the board to verify if the board was working
properly by sending plain
texts and checking their output.
After inspecting the SAKURA checker, it was noticed that the code used a DLL
written in C# to
communicate with the SAKURA-G and it had a considerable amount of code to perform
the sending
and receiving operations. The use of a C# DLL in is relevant because the gathering
script was written
in python and the implementation being used, the standard python, did not support
the load C# DLLs
or programs. Also, even if a C/C++ programmed DLL was found, the time that would be
spent rewriting
and then debugging it, at the time, was not worth it when a working example was
available and just
needed some adaptations.
Taking these two points into account, the decision was to strip the code from its
graphical interface
and adapt the code to receive instructions via console, leaving the communication
with the SAKURA
board as it was. The result was a console program that can receive 3 types of
commands: A command
to change the secret key and the other two to cipher data with the difference one
would return the
result and the other would not.
After having the communicator program working, the final step was to integrate the
program with
the main gathering script. This was achieved by calling the executable with some
parameters such as
the command, the communication channel number SAKURA was and the either the key or
plain text.
The result of this calling would be the cipher text.
4.2.4
The interface to communicate with the SAKURA-W, used by the trace gathering
program, was also
based on a program from the SAKURA website, named SAKURA-W checker [33]. At the
beginning
a similar strategy for the SAKURA-G program was adopted, i.e. transform the
existing code into an
executable that receives commands. When the program was tested, it did not work
and, with the help
li
of an oscilloscope, the executable was found to make the smart card reset, after
sending a plain text
to the smart card. This behaviour caused the trigger to be set to ON at startup,
causing the gathering
program to record the smart card initialization instead of the AES rounds. The
solution was to code
all the handling and send in to the smart card APDUs in python.
In contrast with the SAKURA-G checker, that used a C# DLL to communicate with the
board,
SAKURA-W did not require any DLL since the communication was performed via virtual
COM port.
Also, the code used to communicate was simpler when compared with the SAKURA-G
checker. After
finishing the python implementation of the SAKURA-W communicator, the first tests
were performed
but the smart card did not seem to answer back. After confirming the commands were
being delivered
correctly with the use of the oscilloscope, by observing the bits transmission to
the smart card, one
of the smart card that came with the board was tested instead and the result was
the same. This
behaviour of maintaining the trigger pin at high was observed at the smart card
initialization stage, so
this indicated that maybe the smart card was stuck at the initialization stage.
Further analysis of the code revealed that in the configuration code, the one that
configured the
communication parameters with the virtual COM port, the smart card received a RST
(Request to
send) signal for 200ms, then waited for 500ms before requesting Answer To Reset
(ATR). This served
as hint that maybe the python module used to communicate with the COM port, named
pyserial, was
somehow stuck sending the RST signal even after it was explicitly coded that it
should not. After trying
all the commands that could possibly turn off, the signal did not change the smart
card state, it came
the idea of inspecting the COM lines to assess if indeed the problem was the reset
state. To do that
the virtual COM lines were inspected to check if the RST was still active when it
should not.
To analyse the virtual COM port, it is important to understand the communication
protocol of the
port, in this case the RS-232. The RS-232 is used to transfer digital data between
a Data Terminal
Equipment (DTE) and Data Communication Equipment (DCE) a one bit at the time i.e.
it performs
a serial data transmission. The next step was to create two virtual COM ports, one
connected to
the communicator program and the other port connected to a terminal. The
communicator program
would be a modified SAKURA-W checker and the python program used on the smart
card’s trace
gathering program. The idea was to compare the data and the state flags being
passed during the
communication on the terminal.
Comparing the two programs revealed that the data being passed was the same, but
the state of
the flags was different. The flags displayed, that represented the status of the
pins, were the Carrier
Detect (CD), Clear to Send (CTS), Data Set Ready (DSR) and Ring Indicator (RI). In
the case of
the SAKURA-W, written in C#, the flags after the initialization state only the CTS
was ON. With the
smart card communicator, written in python, the CTS, CD, DSR were ON. This clearly
indicated that
something was not right, since both flags should be the same.
The possible fix was to try the latest version of the pyserial the 3.2.1 instead of
the 2.7, but that
was only available for python 3 and the one being used was python 2. Fortunately,
only small incompatibilities with the new version were raised, but were easily
corrected, and the flag state problem was
solved and the program started to communicate normally with the smart card.
lii
4.3
Signal Analysis
After the traces were collected and stored they need to be processed so they can
provide meaningful information. The scripts developed during this work were: the
correlation power analysis, to
extract the secret key from the traces; t-test analysis, to find points of interest
with a low number of
traces; signal-to-noise ratio, to assess and compare the setups that provide a
better signal quality.
The software used to develop these scripts was Matlab, since it offered programming
flexibility and
provided many of the functions needed by the scripts, as well as tools to present
the data such as in
graphics. The following presents and explains in more detail what the scripts do
and how they were
developed.
The correlation script is used to correlate all power consumption of possible key’s
with the measured power traces using two power models, the Hamming-Height and
Hamming-Distance. In terms
of attackable AES rounds, the script can attack the first and last rounds, which
are the most common
targets, but can easily be extended to attack other rounds. This script was based
on exercises by
rozvoj website [34].
In terms of the script structure, it can be divided in four parts: The first is the
data input, where the
power traces are loaded into memory and the script is adjusted for the trace
characteristics such as
the number of samples, the number of traces, the sample averaging and others. The
second part is
the construction of the hypothetical power consumption, where a matrix is built
considering the pair
text/cipher text, generates all hypothetical values for one byte of the key, using
the Hamming-Weight or
Hamming-Distance. Next is the correlation attack, where the traces collected are
compared with the
hypotheses matrix producing the correlation matrix and finally, the analysis of the
correlation matrix
and the output of the data, such as the key, the correlation coefficient of the top
5 keys.
This script was later extended to also perform incremental CPA, where the number of
the traces
being correlated, generating a picture that shows the evolution of the correlation
with the number of
traces. The other extension was the partial correlation, where only a certain
number of traces were
correlated at the time, allowing to check if there was some variance among the
traces collected.
4.3.1
Signal-to-noise ratio
The SNR script was developed to efficiently measure a signal quality and output a
value that can
use to compare, for example, different trace gathering setups. The script was based
on the one used
in the book ”Power Analysis Attacks: Revealing the Secrets of Smart Cards” [5]
where they explained
how one can calculate the signal-to-noise ratio of one sample point in a trace.
SN R =
signal
SN Rdb = 10 × log10 (SN R)
noise
(4.2)
For the script to work, the user needs to know which values are being processed at
a given time,
this case. To do that, some pre-generated plain texts must be used to produce the
desired HammingWeights at a determined AES round operation, in this case the S-box
output. Then, this operation
needs to be found on the trace or during the device operation, with the use of a
trigger for example.
liii
After having the plain texts and the operations located, the traces are grouped
into nine sets,
according to their Hamming-Weight values, from 0 to 8, and each group processed
individually. The
operations performed on each trace group is the average, to retrieve the signal,
and then the standard
deviation to calculate the electronic noise. Finally, the signal is divided by the
noise retrieving the SNR.
If the user desires to have the value in decibels (dB), it must apply the base 10
logarithm and multiply
by 10.
4.3.2
T-test Analysis
4.3.3
Both the SNR script and the semi-fixed vs. random t-test use a set of plain texts
that produce a
specified intermediary value, in this case a Hamming-Weight, at a specific round
operation. A script
to generate these values was built, based on a Matlab implementation of AES from
[35], to produce
these plain texts.
There were two possible ways to implement this: the first was to generate random
inputs and
perform an AES ciphering and check the value that was being generated at the
specified location; The
second approach was to define the intermediary value and then perform the
deciphering operation
from the targeted AES round, i.e. performing only the number of deciphering rounds
equal to the
targeted round. At the end, the second method was chosen because it would give more
control over
the intermediary values being generated on a specific round.
The intermediary values generated are after a S-box, and can be generated for any
AES round.
When the plain text is generated, it is stored in one of the 9 files, each one
representing the HammingWeights from 0 to 8.
4.4
A user wanting to use this setup has to do a couple of steps before starting
collecting traces. First,
depending if the SAKURA-G or SAKURA-W is used, the programming of the FPGAs is
different. For
the SAKURA-G, the two FPGAs need to be programmed using the Xilinx platform cable
USB, for
liv
the SAKURA-W only the controller is required. Note that the main circuit FPGA only
needs to be
programmed once, even when changing between SAKURA-G and SAKURA-W.
After that, one SMA cable needs to be connected on to board and PicoScope, to
measure the
power consumption, and a probing cable needs to go for the trigger pin. On the
SAKURA-G there
are two options for power measurement: If the user wants to use the integrated
amplifier, it plugs the
cable on SMA J3, but if the user wants to use no amplifier or its own amplifier SMA
J2 is used. For
the SAKURA-W the measuring point is the SMA J2. For the trigger on SAKURA-G, a
probe cable is
connected to pin 3 to gather all AES rounds and for SAKURA-W, the 8th pin counting
from left to right
on the second row of the 40 pins is the one used, because it connects to smart card
pad the AUX 1.
Next the user configures the main gathering script with parameters such the number
of traces, the
signal amplitude, recording duration and sampling frequency. One way to obtain
these parameters is
to use the software that comes with the PicoScope , that enables one to see in
real-time the signal
being gathered and can adjust the parameters for optimal results. Then, after the
main program is
configured, the trace gathering starts and when it finishes the user chooses one of
the Matlab scripts
to perform signal analysis or the attack. Figure 4.3 illustrates the setup using
the SAKURA-W.
Figure 4.3: The complete setup using the laboratory power supply, SAKURA-W on top
of the SAKURA-G and
PicoScope .
lv
4.5
Conclusion
This chapter illustrates the how the overall trace gathering setup is organized.
First, an overview
of the 3 main components the setup is presented, as well as their functionality.
Then each of those
components was described in more detail, explaining some of the development
challenges and decisions. The first component one was the processing unit, here it
was explained the two types of the
devices used, the SAKURA-G for the FPGA and SAKURA-W for the smart card, and the
work developed on them. The second component presented was the trace acquisition,
were the PicoScope was
introduced as well the trace gathering program. The third component was the signal
analysis, where
3 scripts were developed, explaining their functionality and structure.
lvi
5
Experimental Results and
Evaluations
Contents
5.1 Different Setup
Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lviii
5.2 T-
Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . lxv
5.3
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . lxvii
lvii
Following this, a CPA attack is performed, increasing the number of traces
progressively, and
evacuating how the results evolve with the increase of traces.
This section evaluates the proposed setup, described in the previous section. This
evaluation
will show how different setup configurations affect the gathered smart card power
traces. It first
analyses the traces SNR and the respective Hamming-Weight leakage. Following this,
a CPA attack
is performed, first increasing the number of traces progressively and evaluating
how the results evolve
with the increase of traces, and then measuring the results using groups of 50
traces.
5.1
The setup has two main fixed physical components used to gather traces, the target
device and
the oscilloscope, that is in this case were the SAKURA-W (smart card) and the
PicoScope . Other
components that can be used were also considered, to improve the overall signal
quality, namely an
external amplifier (Minicircuits ZFL-1000LN+) and a DC blocker (API 8037), a device
used to filter
out the DC. To assess the impact on the signal quality, three setups were
considered using these
components:
• Setup 1 - Amplifier, DC blocker and oscilloscope in AC mode.
• Setup 2 - DC blocker and oscilloscope in AC mode.
• Setup 3 - Only using oscilloscope in AC mode.
The oscilloscope also had a DC mode, but this one was not considered since the DC
component was
being filtered out, by the DC blocker, and using the DC mode or the AC mode was
equivalent.
For the SNR measurement, the oscilloscope was set to take samples at a rate of 1.25
GS/s (Gigasamples per second) for the duration of 5×10−6 , which lead to 6250
samples and each power trace
occupied 25 KB on file. For the other measurements, the oscilloscope was set to
take samples at a
rate of 1.25 GS/s for the duration of 3.38×10−4 , which lead to 422500 samples and
each power trace
occupied 1.6 MB on file.
5.1.1
lviii
the EEPROM. This power consumption contributes for noticeable distinction between
each hammingweigh, improving the SNR values, since the algorithm measures how
distinguishable different signals
from the noise. Each 9 Hamming-Weights were measured 200 times, to ensure the noise
could be
removed by averaging and the noise characterized with the standard deviation.
The evaluation metric used to evaluate is the SNR average of 2 clock tick, where
the HammingWeights have more stable values, and the signal distinction by measuring
the distance between
highest and minimum hamming weights (8 and 0 respectively). This distance will be
measured in
quantization values, since the captured signals have different amplitudes in terms
of voltage, so an
equal comparison can be made in terms of signal distance. The average of the
highest and lowest
Hamming-Weights (0x00 vs 0xFF) will be calculated and the difference used to
measure the distance
between the two signals.
The results obtained for the 3 setups depicted in Figures 5.1, 5.2 and 5.3 suggest
that: setup 1
had a SNR average of 29.41 dB and a distance of 2.64, (Hamming-Weight 0 and 8), on
clock 1 (from
sample 20 to 180) and a SNR average of 25.41 dB and a distance of 2.43 on clock 2
(from sample
200 to 360); setup 2 had a SNR average of 16.99 dB and a distance of 1.54 on clock
1 and a SNR
average of 13.55 dB and a distance of 0.07 on clock 2; setup 3 had a SNR average of
18.95 dB and
a distance of 1.21 on clock 1 and a SNR average of 13.97 dB and a distance of 0.06
on clock 2.
Figure 5.1: Setup 1: Left image shows the SNR of the power traces shown on the
right.
From these results, it can be concluded that: Setup 1 provided both the highest SNR
and distance
between the Hamming-Weight 0 and 8; Setup 2 and 3 had similar results, whereas
setup 2 had the
lowest SNR but better distance when compared with setup 3. Considering the metrics
proposed, the
overall setup can benefit from an external amplifier with a DC blocker, when
capturing smart card
traces. It is worth mentioning that, setup 1 and 2 might be further improved if
another DC blocker is
used. The reason lies on the minimum pass frequency of 10MHz of the DC blocker,
possibly causing
some information loss, since the smart card is working at 3.571Mhz. Also, when the
SNR ratio was
performed on a larger trace portion it was noticed that some places, besides the
ones analysed before,
showed presence spikes in SNR. These spikes might have useful information, but at
the moment it is
not known what information is.
lix
Figure 5.2: Setup 2: Left image shows the SNR of the power traces shown on the
right.
Figure 5.3: Setup 3: Left image shows the SNR of the power traces shown on the
right.
5.1.2
CPA Attack
This section evaluates the same 3 setup configurations, but now using CPA, to
assess if the results
obtained corroborate the analysis of the previous section.
To test the setups, 2 tests will be performed. First, an incremental CPA is
performed, where the
number of traces are incremented 10 samples at a time until 500, and the distance
between the
correct key average of other key guesses is measured. Secondly, as a complementary
test, the trace
quality of the traces gathered is measured correlating 50 traces at a time and
checking how much
they vary from start to end. The traces used for this test to cover the first AES
round and use the
same smart card tested on the previous section.
The results of the incremental CPA test, assessing the correlation value with 50,
100, 200 and 500
traces, show that: Setup 1 had an average correlation distance of 0.52; Setup 2 had
a correlation
distance of 0.40; Setup 3 had a correlation distance of 0.14, as depicted in Figure
5.4.
These results suggest that the best proposed setup is indeed the one that presented
the best
distinction between the correct key byte and the other guesses, namely setup 1. A
more interesting
lx
Figure 5.4: Difference between the correct key and the average of other keys
hypothesis.
result was between setup 2 and 3 because, from the previous section, setup 3 had
better SNR however setup 2 greater distance, but the incremental CPA was much
better on setup 2. One possible
explanation for these results is that, the distance between Hamming-Weights are
more relevant than
the SNR on CPA attacks. Another explanation is that the other spikes seen in SNR,
might be leaking
information.
One relevant observation that can be noticed on Figure5.4 in that there were a
decrease in value
of the correlation values at the start of setup 2 and 3. This may mean that the
setups had a period of
adaptation, where the first traces had worse quality than the following ones. This
leads to the second
test, with CPA, performed over groups of 50 traces at a time to assess if the trace
quality changes
with more time spend capturing them. The results obtained are depicted in Figures
5.5, 5.5 and 5.5.
As it can be observed, the assumption that the first traces have a worse quality
than the following
ones can be observed on both setup 2 and 3. One reason for this to happen can be
due an adaptation
a period from the oscilloscope, since with setup 2 the oscilloscope has to use its
voltage range ±50
mV which might bring more noise that is filtered out. On setup 3, it might have the
same problem
as setup 2, plus the filtering of the DC component mode on the oscilloscope. By
inspecting the first
traces, gathered suing setup 3, one can observe a period where the DC is taken out
progressively.
Figure 5.8 shows the first 10 traces gathered from setup 3.
lxi
Figure 5.5: Partial CPA from setup 1.
As it can be observed the first traces measured by the oscilloscope sill have some
of the DC
component, that is then gradually removed until the trace stabilize closer to the 0
mV.
After analysing and selecting the best setup configuration, a full CPA analysis was
performed and
all key bytes were found correctly with 25 traces and a correlation distance,
between the correct key
bytes and average of the other key guesses, of 0.15. The conclusion from this
section is that the
setup can benefit from an external amplifier with a DC blocker, leading to a faster
correlation distance
from the correct key and the other keys guesses.
lxii
Figure 5.8: First traces 10 traces gathered from setup 3, showing the progressive
adaptation of the oscilloscope
AC mode.
5.1.3
The SAKURA-(G/W) platform can be powered using the communication USB or an external
power
supply. Depending on the power source used, the acquired power traces can contain
more or less
noise. An interesting test is to use different power sources and see the impact
they have on the trace
quality. Intuitively, a noisy power supply should impact negatively the power
traces.
For this test, 3 power sources used, in particular an adjustable laboratory power
supply; a wall
charger; and the USB cable. Given the results obtained on the previous sections,
setup 1 is used
to gather the traces. The first test, consists in measuring the distance between
the maximum and
minimum peaks of voltage with the power supplies in void, using an oscilloscope,
then each power
supplies will be compared on an incremental CPA using 500 traces, to assess the
real impact they
have on the CPA.
In the first test, the minimum and maximum distance between voltage peaks, the
laboratory supply
showed a distance of 0.11 V, the wall charger had 0.063 V and the USB power supply
had 0.551 V,
as depicted in Figures 5.9, 5.10 and 5.11.
This shows that the USB power supply has the highest noise from the 3 power
sources. It is thus
expected to perform worse than the other two. Surprisingly, the wall charger had
the lowest ripple
noise, even when compared to a more expensive adjustable laboratory power supply.
One thing to
keep in mind is that, the results obtained are just to have an idea of how much
fluctuation all three
power supplies have, since they were measured in void. A more accurate test would
be to measure
the amount of noise the power supplies induce, when current is drawn from them. The
second test
consists in the incremental CPA evaluation, as depicted in Figure5.12.
lxiii
Figure 5.9: Power trace of the laboratory power supply in void.
The obtained results suggest that the best power supply was the adjustable
laboratory power
supply, having about 0.05 volts of distance from the other 2 power supplies
starting at trace 200 and
beyond.
Despite the fact the wall charger had the lowest ripple noise from the other
setups, it performed
identically to the USB power supply. As mentioned before, the power supplies were
tested in void, so
it could be the case that, when the device starts drawing power, the noise
increases. Also, it can be
noticed that the power supplies do not significantly affect the performed analysis.
lxiv
Figure 5.12: Difference between the correct key and the average of other keys.
5.2
T-Test
In the previous section, a CPA was performed showing that was possible to recover
the secret
keys with a correlation distance of about 0.15 from the correct key byte guesses
and the average
other key guesses, using only 25 traces to guess the secret is low, however the
smart card used
has no protection mechanisms, and has its S-box written on the EEPROM, further
increasing the
Hamming-Weight leakage.
If the user was dealing with a more protected smart card, thousands of traces could
be necessary to compromise the smart card. Since performing the CPA computation on
a large trace can be
computationally demanding, a better approach would be to discover portions of the
trace that could
potentially leak information, using a statistic tool that did not require many
traces neither processing
time, reducing the number of samples that needed to be correlated. Since it has
been proven previously that setup 1 had the best results on gathered traces, it was
the one chosen to perform this
test.
This section looks into the Welch’s t-test, a statistical tool that can be used to
find these points of
interest. This test was performed by creating two data sets, one that produced the
Hamming-Weight
value of 0 on the first S-box on the first round, and another set that produced
random values. Then 20
traces of the first AES round were gathered for each data set and the t-test was
performed. To assess
the t-test accuracy, a CPA attack was also performed using the random data set
gathered. This allows
to infer if the t-test distinguishes the point-of-interest faster than the CPA and
if they match.
For this measurement, the oscilloscope was set to take samples at a rate of 1.25
GS/s for the
duration of 3.38×10−4 , which lead to 422500 samples and each power trace occupied
1.6 MB on file.
lxv
The results using 10, 15, 20 and 25 traces are shown in Figures 5.13, 5.14, 5.15
and 5.16.
Figure 5.13: Left image shows the correlation result and right image the t-test
using 10 traces.
Figure 5.14: Left image shows the correlation result and right image the t-test
using 15 traces.
Figure 5.15: Left image shows the correlation result and right image the t-test
using 20 traces.
This allows to infer if the t-test distinguishes the point-of-interest faster than
the CPA and if the
lxvi
Figure 5.16: Left image shows the correlation result and right image the t-test
using 25 traces.
points match. The results using 10, 15, 20 and 25 traces are shown in Figures 5.13,
5.14, 5.15 and
5.16.
From the results obtained, t-test seems to a useful tool to access points-of-
interest much faster
than CPA, but it is worth mentioning that further analysis need to be done using a
more protected
smart card.
5.3
Conclusion
This section presented 3 analysis setup configurations, in order to assess the one
that could provide the best results on the CPA. First the SNR and the distance
between the maximum and minimum
Hamming-Weights, on a trace portion that had a great and noticeable leakage, was
proposed as the
first metric to choose the best setup. Then to confirm this metric, the CPA was
performed, using
the same setup, and the results of the best setup were confirmed. Finally, to
reduce the number of
samples processed by a CPA attack, t-test can be used as a faster way to provide
points of interest,
since the POI showed up much faster and matched the one found on the CPA attack.
lxvii
lxviii
6
Conclusions and Future Work
Contents
6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . lxxi
lxix
Smart cards are a common asset used in our daily lives. Their applicability gives
from transportation, health, payments, telecommunications, identification, among
other areas. These smart cards
provide several tamper proof and unauthorized access protection mechanisms, making
then appropriate to store sensitive information, such as, secret and private keys.
Power analysis is an effective way to retrieve a sensitive information from a smart
card. SPA
attacks are based on the visual inspection of one or a few power traces. They can
provide information
about what operations were performed by the smart card, and even in some particular
cases expose
the secret key. In most cases, the attacker needs to know some smart card
implementation details to
be successful. DPA, depending on the trace quality and protection mechanisms the
smart card under
attack has, might require a considerable amount of traces. This type of attack has
the advantage
of being easy to automate, since the attack is based on finding correlations over
specific points of a
power traces. This has also the advantage of not requiring a detailed knowledge of
the smart card’s
implementation to be successful.
The proposed and implemented solution was to configure a setup that allows the user
to perform side-channel analysis on both smart cards and FPGAs. The main hardware
used was the
SAKURA-G/W, and a PC oscilloscope branded picoscope. To have the setup operating,
four software
components were also developed:
• The software used to program the smart card, containing the AES algorithm, the
communication
and customizable trigger.
• The trace gathering program responsible for configuring and setting the
picoscope, communicate with the target device and gather/store the traces.
• The SAKURA-G and SAKURA-W communication program used by the trace gathering
program
to deliver and received data from the devices.
• The scripts used for the trace analysis (CPA, SNRP, T-test), which were
responsible to analyse
the traces and present the data in a way understandable to the user, such as in
graphics and
data comparison.
Then the implemented solution was analysed to assess which configuration gives the
best CPA
results, since the quality of the traces influences the attack success. The
experimental results suggest
that the setup with the external amplification and using an adjustable laboratory
power supply was the
one that got the best results. This conclusion was deduced from the metrics defined
in the evaluation,
where the SNR and correlation value from CPA were used. Using the best setup
configuration, it was
possible to recover the secret key from an unprotected smart card by performing 25
power traces.
Finally, a t-test analysis was performed and compared with a CPA analysis, and as
expected, the
t-test revealed leaks of information faster than the CPA, but also seem to provide
some false positive
points-of-interest.
lxx
6.1
Future Work
The work developed in this thesis used as its core the CPA algorithm on unprotected
cryptographic
devices, in this case, an smart card. If cryptographic devices that come with
protection mechanism
are used, these methods might not be enough to compromise the device.
As for future work, it would be interesting to see how the CPA attack would perform
against the
defence mechanism that were mentioned on Chapter 3. Also, the use of more advance
techniques,
such as template and collision attack, could be tested to assess their
effectiveness against protected
and unprotected devices.
The methodology used to select the best setup was done using the smart card,
another interesting
test would be to do the same for the FPGA and see in the results match. Also, other
types of setups
can be tested to see if the overall CPA improved. For example, by changing the DC
blocker to one
with a lower frequency range, the results might be improved.
Finally, it would be interesting to use smart cards that are available on the
market and assess their
security against this type of attacks. To do this, the communication protocol used
with that card would
have to be discovered in order for the trace collecting program communicate with
it.
lxxi
lxxii
Bibliography
[1] P. C. Kocher, “Timing attacks on implementations of diffie-hellman, rsa, dss,
and other systems,”
in Annual International Cryptology Conference.
[9] J.-S. Coron, “Resistance against differential power analysis for elliptic curve
cryptosystems,” in
International Workshop on Cryptographic Hardware and Embedded Systems.
Springer, 1999,
pp. 292–302.
[10] W. Rankl and W. Effing, Smart card handbook.
lxxiii
[13] S. C. Alliance, “What makes a smart card secure?” A Smart Card Alliance
Contactless and
Mobile Payments Council White Paper (Oct. 2008), 2008.
[14] D. Eastlake 3rd and P. Jones, “Us secure hash algorithm 1 (sha1),” Tech. Rep.,
2001.
[15] R. Rivest, “The md5 message-digest algorithm,” 1992.
[16] N. F. Pub, “197: Advanced encryption standard (aes),” Federal Information
Processing Standards
Publication, vol. 197, no. 441, p. 0311, 2001.
[17] E. Milanov, “The rsa algorithm,” RSA Laboratories, 2009.
[18] F. Koeune and F.-X. Standaert, “A tutorial on physical security and side-
channel attacks,” in
Foundations of security analysis and design III.
[20] A. A. Ding, C. Chen, and T. Eisenbarth, “Simpler, faster, and more robust t-
test based leakage
detection,” in International Workshop on Constructive Side-Channel Analysis and
Secure Design.
Springer, 2016, pp. 163–183.
[21] S. Lab., “Sakura-g quick start guide,”
http://satoh.cs.uec.ac.jp/SAKURA/doc/SAKURA-G Quik
Start Guide Ver1.0 English.pdf, last accessed 10 May 2017.
[22] T. P. S. research group et al., “Dpa contest,”
http://www.atmel.com/images/doc2512.pdf, last accessed 10 May 2017.
[23] Newae, “Chipwhisperer,” https://newae.com, last accessed 10 May 2017.
[24] ——, “Chipwhisperer products,” https://newae.com/tools/chipwhisperer, last
accessed 10 May
2017.
[25] S. Chari, J. R. Rao, and P. Rohatgi, “Template attacks,” in International
Workshop on Cryptographic Hardware and Embedded Systems.
lxxiv
[30] P. technology, “Picoscope 6000,”
https://www.picotech.com/oscilloscope/picoscope-6000-series,
last accessed 10 May 2017.
[31] C. O’Flynn, “Pico-python,” https://github.com/colinoflynn/pico-python, last
accessed 10 May
2017.
[32] S. Lab., “Sakura-g checker software,”
http://satoh.cs.uec.ac.jp/SAKURA/hardware/SAKURA
Checker release 20130902 3.zip.
[33] ——, “Sakura-w aes checker software,”
http://satoh.cs.uec.ac.jp/SAKURA/resource/SAKURA
W VCP AES Checker 140917.zip.
[34] C. T. University, “Differential power analysis exercises,”
https://rozvoj.fit.cvut.cz/Lisbon/Analysis,
last accessed 10 May 2017.
[35] P. D.-I. J. J. Buchholz, “Matlab implementation of aes,” http://buchholz.hs-
bremen.de/aes/aes.
htm, last accessed 10 May 2017.
lxxv
lxxvi