Notes On Cryptography 2014
Notes On Cryptography 2014
Notes On Cryptography 2014
Notes on Cryptography
by M ICHELE E LIA
A.A. 2014-2015
Preface
Cryptography concerns the principles, protocols, and methods for protecting information against deliberate or accidental modifications (alterations). The protection is achieved by means of transformations, called enciphering, whose aim
is to conceal the information, and by inverse transformations, called deciphering,
to re-obtain the useful information. As Francis Bacons fine irony points out, the
entire set of procedures is nonsense but, as things stand in this world, it is very
useful.
For millennia, cryptography was mainly used in diplomacy and military affairs,
but cryptographic techniques have recently also been introduced into the bureaucratic, managerial, and economic sides of civil life.
Cryptographic techniques can be applied in data transmission, computation and
storage systems. The design of any information protection system may be seen
as the engineering way to solve the philosophically insoluble conflict between
security and the resources needed to achieve it. Therefore, any cryptographic
system is a compromise between functional requirements, scientific knowledge,
logistics, technological possibilities and economical costs. The evaluation of an
information protection system must relate to the combination of these resources,
taking a pragmatic view that excludes both the ingenuity and the presumption
typical of factual knowledge and improvisation.
In this scenario, cryptography is only one component, although an indispensable
one, of any information protection system. A knowledge of it, even if limited
to basic techniques, is an absolute requisite for professionally and successfully
managing any system that needs security.
These Notes are a schematic collection of the basic principles, axioms, and methods of cryptography. They may constitute a didactic support to the lectures, but
should be integrated by reference texts and manuals, among which the following
[39], [27], [48], and [59] are suggested.
These Notes grew out of a graduate course in Cryptography I gave for many year
at the Polytechnic of Turin. Their writing took more time (and more years) than
I would have hoped. I must thank the passionate students whose youthful enthusiasm and interest in the challenging subject of cryptography has stimulated
i
my teaching, including in obscure times, and in very difficult conditions. Unfortunately, their hope of having written notes has been satisfied too late; however, they have my sincere and permanent gratitude for their anonymous, nonrewarded but warm and tangible support. I am indebted to Frances Cooper for
her professional and friendly revision of the English, revision that has also greatly
improved the presentation from a logical point of view. I want also to thank Dr.
Guglielmo Morgari (of Telsy) for his careful reading and for pointing out a lot of
typos and mistakes. The final technical quality owes much to his professional and
deep knowledge of the subject. Obviously, any error or questionable viewpoint,
is my responsibility alone, and due to my too many limits.
ii
Texts
J. Hoffstein, J. Pipher, J.H. Sylverman, An Introduction to mathematical Cryptography, Springer, New York, 2008.
N. Koblitz, A Course in Number Theory and Cryptography, Springer, NY, 1987.
F. Fabris, Teoria dellInformazione, Codici, Cifrari, Boringhieri, Torino, 2001.
R. Mollin, An Introduction to Cryptography, CRC, NY, 2007.
Manuals
A.J. Menezes, P.V. vanOorschot, S.A. Vanstone, Handbook of Applied Cryptography, CRC, New York, 1997.
iii
iv
Contents
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4.4
4.5
4.6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- 4.6 - 4.6 - 4.7 - 4.7 - 4.7 - 4.8 - 4.11 - 4.13 - 4.15 - 4.17 - 4.19 - 4.20 -
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- 5.1 . - 5.1 . - 5.2 . - 5.3 . - 5.3 . - 5.5 . - 5.7 . - 5.9 . - 5.11 . - 5.12 . - 5.13 . - 5.13 . - 5.14 . - 5.15 . - 5.15 . - 5.17 . - 5.20 . - 5.21 . - 5.23 . - 5.27 . - 5.31 -
.
.
.
.
.
.
Electronic signatures
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
7.1.1 Electronic signature of an electronic document
7.2 Components of Electronically Signed Documents . . .
7.2.1 Document . . . . . . . . . . . . . . . . . . . . .
7.2.2 Standard hash function SHA-1 . . . . . . . . .
7.3 Signature based on RSA . . . . . . . . . . . . . . . . .
7.4 Signature based on Rabin scheme . . . . . . . . . . . .
7.5 Signature based on El Gamal . . . . . . . . . . . . . .
7.6 Blind signature . . . . . . . . . . . . . . . . . . . . . .
7.7 Secret Sharing - Shamir . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- 7.1 . - 7.1 . - 7.4 . - 7.5 . - 7.5 . - 7.8 . - 7.8 . - 7.9 . - 7.13 . - 7.14 . - 7.16 -
Complexity
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
8.1.1 A heuristic view of computational complexity
8.2 Complexity: the Heart of Cryptography . . . . . . . .
8.2.1 One-way functions . . . . . . . . . . . . . . . .
8.3 Arithmetic complexity . . . . . . . . . . . . . . . . . .
8.3.1 Complexity of product and exponentiation . .
8.3.2 Finite field Arithmetics . . . . . . . . . . . . . .
8.4 Factorization complexity . . . . . . . . . . . . . . . . .
8.4.1 Factorization in Z . . . . . . . . . . . . . . . . .
8.5 Discrete logarithm . . . . . . . . . . . . . . . . . . . .
8.5.1 Discrete logarithm as one-way function . . . .
8.5.2 Discrete Logarithm Complexity . . . . . . . . .
8.5.3 Shanks Bound . . . . . . . . . . . . . . . . . .
8.6 Searching Unsorted Data (SUD) . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- 8.1 . - 8.1 . - 8.3 . - 8.5 . - 8.7 . - 8.8 . - 8.8 . - 8.9 . - 8.10 . - 8.11 . - 8.11 . - 8.13 . - 8.14 . - 8.16 . - 8.17 -
ECC
9.1 Introduction . . . . . . . . . . . . . .
9.2 Elliptic Curves and Group Law . . .
9.2.1 Group Law . . . . . . . . . .
9.3 EC over Finite Fields . . . . . . . . .
9.4 EC Public-key Schemes . . . . . . . .
9.5 Arithmetics and complexity in ECC
9.6 Historical Notes . . . . . . . . . . . .
9.6.1 The origins . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10 Cryptanalysis
10.1 Introduction . . . . . . . . . . . . . . . .
10.2 Axioms . . . . . . . . . . . . . . . . . . .
10.3 Cryptanalysis of secret-key systems . .
10.3.1 Cryptanalysis of classic schemes
10.4 DES Cryptanalysis . . . . . . . . . . . .
10.5 Cryptanalysis of Public Key Systems . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12 Steganography
12.1 Introduction . . . . . . . . . . . . . . . . .
12.2 Some historical notes . . . . . . . . . . . .
12.3 Steganographic channel models . . . . . .
12.4 Concealment issues . . . . . . . . . . . . .
12.4.1 Examples, Simulation, and Results
12.5 Conclusions . . . . . . . . . . . . . . . . .
viii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- 0.0 -
Chapter 1
Cryptography from Art to Science
Some people are so busy
learning the tricks of the trade
that they never learn the trade.
V ERNON L AW (Pittsburgh Pirates pitcher)
1.1
Introduction
It is a fact of recent history that, in the last two decades of the twentieth century, a scientific, technological, and cultural revolution swept through the communication systems of high-technology countries. Satellite telecommunications,
cellular telephony, digital television, the Internet and personal computers show
that the convergence of telecommunications and computer technology has overturned the entire world order of Information Technology. This atypical revolution
has had unforeseeable repercussions also on the traditional methods of knowledge production and transmission. However, the effects in these fields will only
be observed in the coming decades, and they will probably turn out to be much
more far-reaching than the highly visible modifications already produced on the
world economy and finance. Commerce is increasingly based on the Internet,
with sometimes disturbing effects on the consolidated systems of dealing and of
handling goods. In the banking world, the traditional branch, thanks to the Internet, has expanded to enter into the homes of net customers, modifying both the
way users relate to the banking system and the inner organization of the banks
themselves.
Whereas in one respect these perhaps irreversible phenomena have improved
the quality of life, they have conversely made the system as a whole more fragile
and more sensitive to any recession. Adversaries of all types, compatriots or foreigners, governmental or private bodies, can order and scan plain text they have
intercepted and selected, based on details of your address, or on convenient key
words present in the message. This improper monitoring activity has been going
on for decades, obviously even before the computer made the job so much easier. The novelty comes from the proportions and the number of customers who
- 1.1 -
entrust their personal transactions and secrets to fiber optics, to copper cables or
to the ether. The more a country is technologically advanced, the more usually
will it be susceptible to interception of electronic traffic. Therefore, protection of
information is becoming an unavoidable necessity to assure a societys operative
life. The technologies for protecting information have been developed in the discipline known as cryptology. For millennia, cryptology had as main objective the
confidentiality of information, but in recent times, the technological evolution,
together with the creation of a world-wide society with integrated services and
global systems of communication, has delegated much more extensive, widerranging and more complex objectives to cryptology. Specifically, the number of
services that need some form of information protection is continuously growing.
Any list would fail to be complete, but will be topped by the telephone, e-mail, ecommerce, tele-working, remote monitoring, tele-medicine, and could continue
almost indefinitely.
1.2
Information Protection
It does not appear that the definition of a system for protecting information can
be formulated in a definitive manner through authoritative statements. Rather,
security comes from the concurrence of needs, situations, and purposes that contribute to defining the scenario in which information plays the role of principal
actor.
A system for the protection of information depends on:
1. Accuracy of the principles.
2. Robustness of the mathematical procedure used to transform the information.
3. Physical security of the technological equipment that processes the information and the environments where such devices reside.
4. Discipline of employees, by discipline meaning the mental attitude and the
behavioral attention to details that could make even the most technicallysecure system vulnerable.
As just noted, security systems bring together many components of a human and
technical nature. Among these, an important role is played by cryptology and
related mathematical techniques.
1.2.1
The objectives for protecting information against deliberate manipulation in general should respond to four basic questions:
1) What information to protect?
- 1.2 -
1.2.2
Aims
The situation that pits the defender against the attacker has a dual aspect, that
characterizes the two main branches into which Cryptology is partitioned: cryptography/steganography and cryptanalysis.
Cryptography/steganography pursues five main goals:
- To protect against intruders, ensuring that access to the information is reserved to authorized persons, entities, or devices.
- To protect from deliberate destruction or alteration, ensuring the datas integrity, both logical (meaning of the texts) and physical (supporting paper,
magnetic tapes, CD-ROMs, etc..).
- To prevent shadowing (authenticity), namely to ensure recognition of the
source of information.
- To prevent repudiation (signature), to ensure the impossibility of denying
the origin of a message.
- To prevent tracking, ensuring anonymity of the source and route of messages, objects or people.
The purposes of cryptanalysis are operations that may be the converse of the aims
of the above list, namely:
- To determine the contents of a message.
- To destroy a message, i.e. to deliberately prevent communication between
two parties.
- To falsify, that is to send a message as if it were from another author, such as
launching a communication with a party and being accepted as a legitimate
counterpart.
- To deny being the author of ones own message.
- To trace the origin and path of messages, objects, or people.
1.2.3
Summary
The five situations considered above are at the core of modern cryptology, and
can all be incorporated into a mathematical description in the framework of the
Shannon information theory. However, for practical purposes, it has been preferred to develop a discipline that is apparently independent, referring to information theory only for the basic principles. This will be the subject of the following chapters.
- 1.4 -
1.3
Historical glimpses
The millenary history of cryptology began in ancient Egypt at the Court of the
Pharaohs where, between sphinxes, pyramids, and plots, for millennia the power
game was played. But it was the warrior soul of Greece, with its oligarchic system, kingdoms, and ambitions of military and cultural domination, that first systematically applied a cryptographic method of which we have any certain knowledge. In the silent palaces of Sparta, King Agide encrypted messages directed to
his distant generals in charge of controlling the eastern Mediterranean, by rolling
up a string of papyri, helicoidally around a skytale (command baton) and writing his message along the length of the roll. The straightened string of papyri
with the encrypted message looked like a chaotic set of symbols. To read his message, the general rolled up the string around a baton of the same diameter. Today,
these procedures for exchanging secret messages may move us to smile. Nevertheless, they solved the problem of private communication in an acceptable way,
compatibly with the available technology.
1.3.1
From the Spartan hegemony on the Aegean sea, through the grandeur of the
Roman Empire, the effervescent political and cultural milieu of the Italian Renaissance, down to the modern supra-national governments, cryptography has
been variously, but almost exclusively, used in affairs of power. The impulse to
its development was almost always given by the exigencies of war. Surely, the
complex needs of the Roman army to exchange secret messages at the time of
Gaius Julius Caesar promoted the invention and diffusion of a method for concealing information that was relatively secure, and at the same time operatively
easy. The cryptographic method known as Caesars cipher consisted in substituting each letter with a letter three positions onward (in the natural alphabetical
order from A to Z of the letters). For example, the letter A is substituted with D,
B with E, and so on, W being replaced by Z. The last three letters X, Y, and Z are
substituted with A, B, and C, respectively. The rule was very easy and number
3 was the secret key for enciphering and deciphering. The decryption operation
to recover the original message from the encrypted text consisted of the inverse
substitution, which can be described similarly: each letter is substituted with the
letter three positions before it. Technically, in jargon, this encryption rule is called
mono-alphabetic, while its generalization is called polyalphabetic substitution.
This general and relatively strong encryption rule (i.e. polyalphabetic substitution) was perfected by Blaise de Vigen`ere and reported in his Traicte des chiffres,
ou secretes manieres descrire published in 1586, where a square table that bears
his name appeared with a certain emphasis, for the first time. However, this table had already been reported in De Furtivis Literarum Notis by Giovanni Battista
Della Porta, published in 1563. The Vigen`ere polyalphabetic enciphering was
long considered impossible to crack. In polyalphabetic encryption, the key consists of an ordered set of numbers (or letters), for example, encrypting with a key
- 1.5 -
consisting of the numbers 3 and 12, the letters of the text, starting from the first,
are alternately transformed by mono-alphabetic substitution as in the Caesar cipher, with keys 3 and 12.
The principle of substituting a message with a second message, according to a
rule governed by a secret key, easy to remember, in such a way that only the
person who has the secret key may go from the encrypted message to the original
message, constitutes the essential part of private key encryption. The first treatises about these cryptographic techniques appeared around the sixteenth century in Italy, although the first known manual of cryptography had already been
published in 1379 by one Gabriele de Lavinde of Parma, possibly a cryptographer
who served in the secretariat of Clemente VII, the antipope.
A prominent position in the literature on cryptography is occupied by De Componendis Cyfris (1466) by Leon Battista Alberti, a work in which, together with
the principle of polyalphabetic encryption, the first encryption disc is described,
and the concept of cryptanalysis is introduced. In particular, several methods for
attacking encrypted messages are proposed. Cryptanalysis was highly prized, as
testified by many books of the same period, written by scientists of the time who
also acted as court cryptographers, such as Cicco Simonetta with the Sforza family in Milan, Giovanni Soro serving the Venetian Republic, and Giovanni Battista
Argenti, who served the Pope. Interest in the art of secret writing was certainly
stimulated by the quarrelsome nature of the princes of the time, and the typical
Italian taste for political manoeuvring. Meanwhile, connoisseurs of cryptography were artists, scientists, and politicians, working directly in other sectors like,
for example, the mathematicians of the Bologna school. In this cultural environment, the contribution to cryptography made by the great Lombard mathematician Girolamo Cardano was both varied and remarkable. In his work De
Subtilitate libri XXI (1550), Cardano describes, among other subjects, a lock with
rotors that could be unlocked only by a given letter combination (the letters being written on the external side of the rotors). Among eminent Europeans who
were interested in cryptography during the fortunate Renaissance period are men
of vast cultural interests, like the already-cited Leon Battista Alberti, Giacomo
Casanova (who first intuited how to break polyalphabetic encryption), famous
mathematicians like John Wallis and Francois Vi`ete, and Francis Bacon, philosopher and statesman. In the seventeenth and eighteenth centuries, progress in the
cryptographic field was slow and insignificant. The evolution restarted, quite
suddenly, around the middle of the nineteenth century, sustained this time by the
industrial revolution and by the economic and governmental interests of the great
modern States. In 1863, the important work Die Geheimschriften und die DechiffrirKunst by Friedrich W. Kasiski was published. This book describes in detail how
to break, with cryptanalysis, the Vigen`ere encryption. Despite general works,
like the Traite de cryptographie by Lange and Sourdat, published in 1925, or the
important Manuale di crittografia by the general Luigi Sacco, published in 1936,
which is one of the best-known and most interesting cryptography treatises of
the early twentieth century, the true great leap forward toward a mathematical
theory of cryptography only occurred at the end of the second world war. The
- 1.6 -
were still based on the old principle of rotors, borrowed from Albertis disc. To
meet the requirements of the globally-expanding economy, several standardization processes were started. In the 1970s, the most widely-debated system for
private key encryption was the DES (Data Encryption Standard) proposed by the
American National Bureau of Standards, and developed on an initial project by
IBM. DES, and its successor AES (Advanced Encryption Standard) may represent
the last step of the rotor machine development.
In the subsequent evolution, the word machine is still maintained, but it must
be intended as a mathematical computing procedure by means of algorithms.
All these machines are commonly known as encrypting machines. They realize
private key encryption and represent the modern variant of the Caesar cipher,
improved with tricks aimed to achieve the perfect encryption system known as
one time pad. Actually, this system, notoriously used in the fascinating world
of spying, encrypted a message by substituting the message letters by a combination with letters suggested by special positions in the pages of a booklet (pad)
used only one time. The system is practically unbreakable without knowing the
book.
1.3.2
Cryptography was treated as an art for centuries. From invisible ink to mechanisms combined with key words to open secret doors. From rings that should
combine to show incredible secret passages, to mysterious combinations of carillon notes that open fabulous strongboxes. From the vaguely cryptic love messages of Cirano de Bergerac to the beautiful Roxanna, to the light signals between
lovers in the Certosa di Parma by Stendhal, every action, instrument, or event
contributed to make cryptography a mysterious art.
However, the needs of governments of great modern states called for something
more than a reliance on experts, however loyal, or accredited men skilled in cryptography. Thus, it is not entirely by chance that the English philosopher and
statesman Francis Bacon formulated the basic criteria that should be met by good
encryption systems. Bacon made a major contribution to the rise of a scientific
theory of cryptography. But only relatively recently has a complete axiomatic
formulation of cryptography been achieved, by merit of Claude Elwood Shannon with the publication of his paper Communication theory and secrecy systems, in 1949. Actually, this paper was already completed in 1945, but was classified material, and only after its declassification could it appear in a publiclydistributed journal. Key in this rigorous description of encryption/decryption
operations was Information Theory, a mathematical theory of communication
also due to Shannon. As a result of this approach, all impossible expectations of
cryptographic protection were abandoned. With Shannons proof that certainty
in any cryptographic protection does not exist, the dream of perfect secrecy finally waned. All protection is of a probabilistic nature. We may only reduce the
probability of violating a secret system, but we will never achieve the certainty of
absolute inviolability.
- 1.8 -
The axiomatic formulation made of cryptography a discipline similar to mathematics. Today, the prophetic words, pronounced by Adrian A. Albert at the
opening of the 382nd Conference of the American Mathematical Society in 1939,
are astonishingly concrete:
We shall see that cryptography is more than a subject permitting mathematical formulation for indeed it would not be an exaggeration to state that
abstract cryptography is identical with abstract mathematics.
- 1.9 -
Chapter 2
The Shannon theory of secrecy
systems
There is nothing more difficult to take in
hand, more perilous to conduct, or more uncertain in its success, than to take the lead in
the introduction of a new order of things.
Niccolo` Machiavelli
2.1
Introduction
The theoretical foundation of modern cryptography can indisputably be attributed to Claude Elwood Shannon, with his 1949 publication of the paper Communication Theory and Secrecy Systems, in the Bell System Technical Journal [68].
The paper, almost surely completed in 1945, was considered classified material
(a document or information is classified when it is considered important for national security) and it was declassified only four years later, just before its appearance in the open literature. This paper that founded cryptology followed by
one year, if possible, an even more important paper by Shannon, that is A Mathematical Theory of Communication, which appeared in 1948 in the same Bell System
Technical Journal. This paper was determinant for the theoretical and practical
development of telecommunications systems [69]. In it, after having introduced
axiomatically a measure of information, Shannon developed a totally new theory
by means of which to describe all problems concerning the transmission, storage,
and transformation of information. Shannons measure of information has the character of a physical quantity, analogous to the measure of surfaces, energy, time,
speed, etc.
This definition of a measure of information has enabled the true nature of information to be established, tied to uncertainty and to discrete character. It has also
permitted a better comprehension of the mechanisms conveying information, and
in cryptography the understanding of the achievable limits of data security.
- 2.1 -
E NCODER
Noisy channel
E = G(M, N)
D ECODER
2.2
Claude Shannon considered information as a reduction of uncertainty, thus connecting the measure of information to the physical measure of uncertainty, namely
entropy. With admirably mathematical rigor (although Shannon was entering
completely new territory) he deduced the measure of information from a set of
axioms composed of the axioms that are at the basis of the theory of mathematical
measure, completed by a specific axiom for dealing with information. The resulting measure was the classic entropy, that had been introduced, in the ninetieth
century, for sizing the uncertainty of systems of particles like molecules, atoms,
photons, etc.
In the following we recall some results from information theory, as they were
derived by Shannon, which are indispensable to cryptographic uses.
m
Let A = {ai }N
be the set of
i=1 be an alphabet of N symbols, and let M = A
messages of length m over the alphabet A.
Assuming that the messages M M are random events of a stationary stochastic process characterized by a probability distribution p(M), the entropy of the
messages H(M) is defined as the average
H(M) =
p(M) ln
MAm
1
,
p(M)
(2.1)
and the source entropy, i.e. the entropy of the alphabet H(A), is defined as the
limit
1 X
1
H(A) = lim
p(M) ln
.
(2.2)
m m
p(M)
MAm
The entropy H(A) is a nonnegative number that evaluates the uncertainty reduction relative to a symbol of the alphabet A as a consequence of its emission.
Shannons interpretation of the entropy was that H(A) represents the amount
of information that, on the average, a symbol of the stream M can support. If
- 2.3 -
p(a) ln
aA
1
.
p(a)
(2.3)
The entropy H(A) intended as a function of vector P = (p(a1 ), . . . , p(aN )), with
ai A, of dimension N , is a convex function that assumes the maximum value
H(A) = ln N when all symbols are used with the same probability, that is p(a) =
1
. In this model we have
N
H(M) = H(Am ) = mH(A) .
When a message M M is sent over a real communication channel, it is corrupted by some noise, given by a block N of symbols from the same alphabet A.
The received message is E = G(M, N) E. The noisy channel may be ultimately
seen as a discrete memoryless source of random symbols from the same alphabet
A, which corrupt the message M. The received message E may be different from
M; nevertheless, E yields some information on the sent message M.
The most important parameter introduced by Shannon for describing these situations was mutual information I(M, E), defined as:
I(M, E) =
p(M, E) ln
MAm ,EAm
p(M|E)
.
p(M)
(2.4)
where p(M, E) is the joint probability distribution of the sent and received messages, and p(M|E) is the corresponding conditioned probability distribution.
I(M, E) represents the amount of information that each received symbol gives,
on average, about the transmitted symbol.
Mutual information is non-negative: if it is zero, the channel is useless, that is,
instead of transmitting, at the receiver side the symbols may be randomly produced. The following equations connect mutual information and entropies:
I(M, E) = H(M) + H(E) H(ME) = H(M) H(M|E) ,
where H(M, E) and H(M|E) are, respectively, the joint entropy and the conditional entropy of the input and output symbols of the channel. H(M|E) may be
interpreted as the amount of information that is still needed to specify M completely, when the received symbol E is known.
2.3
In search of perfect secrecy, Shannon conceived the idea of modeling encryption operations as transmission over a very noisy channel, where the messages
M M from an information source are corrupted by noise to such an extent that
they cannot be recognized by any unauthorized observer, but this noise should
- 2.4 -
Private Channel
E = F(M, K)
Public Channel
M = G(E, K)
p(M, E) ln
MAm ,EAr
1
.
p(M|E)
(2.7)
p(M, E|K) ln
- 2.5 -
p(M|E, K)
.
p(M|K)
(2.8)
are equally of interest in defining the ideal cryptographic situation, which may
be taken as reference.
Definition 2.1. An encryption process is called perfect when the mutual information
I(M, E) is zero, while the conditional mutual information I(M, E|K) yields all the information of the original message, that is
I(M, E)
= 0
(2.9)
I(M, E|K) = H(M) .
In other words, assuming that the key is not known, the encryption of a message is perfect when the received message does not give any information on the
original message. In terms of entropies the equation (2.9) implies
H(E) = H(E|M) ,
(2.10)
which means that the entropy of the encrypted message is not reduced by the
knowledge of the original message. Furthermore,
H(E|K) H(E|M, K) = H(M) ,
that is H(E|K) = H(M) since H(E|M, K) = 0. In conclusion, knowledge of the
key must enable the original message to be obtained from the encrypted message, without any further uncertainty. Operatively, this condition assures that
the decrypting operation is possible, i.e. that the function G(., .) exists. Since the
relation between the key and the encrypted message is described by the equation
(2.5), the entropy of the encrypted messages, knowing the original messages, is
not greater than the entropy of the keys H(K) H(E|M); then, using (2.10), we
have the chain of inequalities
H(K) H(E|M) = H(E) H(E|K) = H(M) .
(2.11)
The interpretation of this last equation is that to achieve perfect secrecy, the entropy of the keys should not be smaller than the entropy of original messages.
Very frequently, messages, keys, and encrypted messages are all obtained by concatenating statistically-independent symbols from the same alphabet A. In this
case we have H(M) = mH(A), and H(K) = kH(A), thus the inequality among
entropies implies k m for perfect secrecy. We may collect the previous result in
a significant statement
Proposition 2.1. In order to achieve perfect secrecy, it is necessary that the entropy of
the keys not be smaller than the entropy of the original messages. Equivalently in terms
of alphabet symbols, the length of the key should be not shorter than the length of the
original message.
Historically, perfect secrecy was achieved by the classical encryption systems
adopted by spies, and known as a one-time pad, where a book (pad) was used
only once with a precise algorithm for specifying which character to use in which
page, for encrypting the characters of the original message.
- 2.6 -
Short keys In most cases, the key set K has fixed small dimension k, while both
m and r may grow indefinitely. Therefore, it is impossible to satisfy the condition
(2.9) for perfect secrecy.
In this fairly common situation, to evaluate the amount of information that the
encrypted stream gives about the plain-text stream, we make two assumptions:
1) the choice of key is independent of message generation, and 2) m is always
greater than k (actually this is not a real constraint, but avoids repeating this
condition each time). It follows that conditional entropy equates key entropy,
that is
H(M|E) = H(K) .
A large class of encryption schemes consists of cryptographic transformation
working symbol by symbol, that is each symbol mi in the message is changed
into an encrypted symbol Ei as
Ei = f (mi , Ki ) ,
whore Ki is a symbol of a secret key stream. Assuming that the symbols mi are
statistically independent in each message, then we have
H(M) = mH(A) ,
while the entropy of the keys, assuming that each key is generated by a mechanism that produces k statistically-independent symbols, and the remaining r k
symbols in a deterministic way, starting from the previous k random symbols, is
H(K) = kH(A) .
Clearly, we cannot have perfect secrecy, because the key entropy is limited: the
mutual information exchanged by original message and encrypted message turns
out to be
I(M, E) = (m k)H(A) ,
that is, the information given by the encrypted message about the original message grows linearly with the message length. Nevertheless, this scheme is one
of the most widely used in practice. It is reasonably secure, provided that H(K)
is sufficiently large to prevent a computational evaluation of K, even under the
assumption that plain and cipher text are known.
In the previous arguments, it was tacitly assumed that the entropy of the encrypted message and the entropy of the original message are equal. However, in
some circumstances, it may be profitable to consider encrypted messages having
entropies larger than the entropies of the corresponding original messages; this
situation is quantified by the encryption rate.
Definition 2.2. Let M = Am , K = Ak , and E = Ar be the sets consisting of blocks of
m, k, and r symbols from an alphabet A, respectively. The parameters m and r denote the
length of the plain and cipher texts, respectively, and k is the length of the key. The ratio
=
m
,
r
- 2.7 -
m
,
r+k
2.3.1
The typical encryption operation of binary sequences is the simple and fast binary
sum modulo 2
ei = mi + ki
mod 2 .
This is the standard encryption operation of stream ciphers, which are machines
generating the key stream ki starting from a short key (the secret key) K. This
encryption procedure will be described in greater detail in Chapter 4.
2.4
Cryptology
cryptology is the scientific discipline that deals with the methodological and mathematical aspects of protecting information against deliberate alterations or intrusions.
It is sub-divided into two great branches with opposite aims:
cryptography/ steganography aim to develop protection methods of different sorts;
cryptanalysis aims to break protection systems and methods.
The two branches are closely correlated, and the design of good cryptographic
functions cannot avoid in-depth cryptanalysis. Nevertheless, the mathematical
methods are quite different.
Cryptography deals with methods for protecting information, in particular to
achieve:
- 2.8 -
1. Confidentiality in any sort of communication, that is to assure the privacy of the exchanged messages.
2. Integrity in any sort of communication, that is to maintain unchanged
a not necessarily secret message against deliberate alterations.
3. Authenticity of the interlocutors, that is to guarantee the identity of the
conversation partners, namely sender and recipient of a message.
4. Non-repudiation of the authorship by the authentic signer, that is to
guarantee the recipient that the signer cannot reject the authorship.
Steganography deals with the protection achieved by hiding the existence of the
message, which may be embedded in innocent information, or in any unsuspected object.
Cryptanalysis deals with problems of offensive, that is with developing attack
methods to violate that characteristic of the message protected by cryptography. In particular, typical cryptanalysis actions are:
1. Retrieving the text of a message protected by cryptography, having at
ones disposal only partial information.
2. Altering or destroying plain or encrypted messages.
3. Fraudulently impersonating the legitimate interlocutor.
2.5
Cryptography
The abstract solution of this problem, in Diffie and Hellmans conception, formally introduced the new concept of one-way function, or better, a new way to interpret the traditional cryptographic functions. These traditional cryptographic
functions, they said, should satisfy an asymmetric complexity not in the way they
are used, but between the modes of application and their cryptanalysis. Diffie
and Hellman introduced a function F (.) that should satisfy the following properties:
1) To be easily specified together with its inverse function F 1 (.);
2) To be easy to compute (message encryption) C = F (M ), given M , a plain
message;
3) To be hard to compute F 1 (.) from the sole knowledge of F (.);
4) To be easy to compute M = F 1 (C) the plain message, given the encrypted
message C.
Conditions 3) and 4) are not contradictory, because the goal is to easily retrieve the
message, having designed both F (.) and F 1 (.), but that the uninitiated cannot
obtain the inverse function simply from the knowledge of the function F (.).
This problem has originated public-key cryptography, which was immediately
useful in telecommunications systems, in the management of computing resources, and in many banking operations.
One-way functions. The notion of one-way function is fundamental in publickey cryptography, although their existence is still questionable, as it has not been
formally proved. A large number of systems base their security on a mathematical notion, whose significance has not been rigorously proved.
Many potential one-way functions have been proposed, however the only surviving candidates are borrowed from elementary number theory. Precisely, the only
known putative one-way functions are based on
1. the difficulty of factoring integer numbers
2. the difficulty of computing the discrete logarithm in certain representation
of cyclic groups of prime order
3. the difficulty of decoding linear error-correcting codes that miss symmetry.
The idea of using the difficulty of computing the discrete logarithm in convenient
cyclic groups was introduced by Diffie and Hellman.
The idea of using the difficulty of factoring was introduced by Rivest, Shamir,
and Adleman with the description of the algorithm known as RSA.
Lastly, the idea of using the difficulty of decoding error-correcting codes is due to
McEliece.
All these problems are the object of stringent research, but the main objective
remains an axiomatic proof of the existence of one-way functions.
- 2.10 -
The secret encryption key is a sequence from the same set ZN , and of the same
length
K1 , K2 , . . . , Ki , . . . .
The (cipher text) is obtained by composition on a symbol-by-symbol basis
Ei = Mi + Ki mod N
i .
i .
Conversely, assuming that the key is a sequence of equally-probable and statisticallyindependent symbols, it is impossible to obtain the plain text from a knowledge
- 2.11 -
of the cipher text only. We have p{Ei = j} = p{Mi + Ki = j}, for every j ZN ,
that is
N
1
X
1
p{Ei = j} =
p{Mi = j ` mod N }p{Ki = `} =
N
`=0
which means that the symbols in the encrypted message are also equally probable and statistically independent.
Shannons perfect secrecy is thus possible. However, it is not practical for use by
a large number of people in unrestricted scenarios. In the real world, the implementation of cryptographic schemes calls for more pragmatic solutions, and thus
some basic principles that should be followed by any good cryptographic system
have been formulated.
In 1883, Auguste Kerckhoffs wrote two journal articles on La Cryptographie Militaire, in which he stated six design principles for military ciphers. Most of them
are now redundant because of computers, but two are still valid:
1. Encryption and decryption rules should be easy to implement. Actually this rule
was already formulated hundreds of years before by Francis Bacon.
2. The encryption rule must not be required to be secret, and it must be able to fall into
the hands of the enemy without inconvenience. In other words, security should
lie all in the secrecy of the key. This is now known as Kerckhoffs principle.
2.6
Steganography
- 2.13 -
Chapter 3
Random Sequences and Statistics
The very name calculus of probabilities is a
paradox. Probability opposed to certainty is
what we do not know, and how can we calculate what we do not know?
H. Poincare, Science and Hypothesis
3.1
Introduction
In this chapter we will consider sequences that may look like random sequences
or, more technically, discrete stochastic sequences. Random sequences, besides
playing a key role in cryptography, have many applications in other fields: in
spread-spectrum techniques, in communication theory, in testing digital devices,
in satellite navigation and localization systems, in computer simulation, to limit
the list to some important applications.
Consider for example the following binary sequences
1)
2)
3)
4)
. . . , 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, . . .
. . . , 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, . . .
. . . , 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, . . .
. . . , 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, . . .
we may ask whether they have been generated by some random process, or have
a deterministic origin. In fact:
- sequence 1) was generated by a deterministic mechanism
- sequence 2) was generated by flipping a coin
- sequence 3) was generated considering the binary representations of the
first 6 digits in the decimal part of
- sequence 4) was generated by a primitive linear feedback shift register of
length 4.
- 3.1 -
Unfortunately, Lehmers definition of a random sequence does not exclude paradoxical situations, because the tests are always done on a finite number of instances and suffer from the limits of this kind of statistics. However, a choice
must be made, in spite of possible errors. Lehmers view is unavoidable if we
want to obtain conclusions of practical value.
A key role is always played by the sample space, which depends on the uses the
sequence is to be put to. The sample space specifies the set with respect to which
we want the statistics.
3.1.1
Sample Spaces
What is the probability of two given people having been born on the
same day?
1
This probability, 365
, is obviously different from the probability
answer to the question
1
,
3652
which is the
What is the probability of two given people having been born on the
same given day?
The specification paradox comes from the infrequent event of two people, who
are not familiar with statistics, who meet and discover that they were born on
the same day: they believe that the event is very rare, and motive for wonder.
1
, a figure that makes the event not so
Actually, the probability of the event is 365
rare in everyday life.
The birthday problem is also generalized as follow
What is the probability that, among n ( 365) given people, at least
two of them having been born on the same day?
The probability is computed as the complement to 1 of the probability that all
people have been born in different days. This probability is computed considering the sample space of n-tuples with entries from the set N = {1, 2, . . . , 365}
which represents a numbering of the days of the year. The size of the space is
365n , whereas the number of favorable events consists of the n-tuples with all different entries. This number is obtained considering that the first position may be
any number in N , the second position may be any number in N different from
that in the first position and so on; thus the total number of n-tuples is
365 364 363 . . . (365 n + 1) .
The resulting probability is
365 364 363 . . . (365 n + 1)
.
365n
The probabilities for various values of n are reported in the following Table
p{collision} = 1
n p{non repetition}
2
0.997
5
0.973
10
0.871
20
0.589
22
0.524
23
0.493
24
0.462
40
0.109
60
0.006
p{repetition}
0.003
0.027
0.129
0.411
0.476
0.507
0.538
0.891
0.994
Sometime, it is seen as surprising that the collision probability exceed 0.5 with
only 23 people. However, the probability that at least two people among 23 are
born in a given day is 0.00186, a value that seems to be of non particular interest.
- 3.4 -
Coin tossing. The experiment of tossing a coin, which lands on either one or
the other of its two sides, called head and tail, has two possible outcomes. In the
case of a single toss, the sample space has two elements that will be denoted as
{H, T }. Let p be the probability of getting H and 1 p the probability of getting
a tail; if p = 21 we say that the coin is fair. Consider the case of two experiments.
One may toss two indistinguishable coins simultaneously, or one coin twice. The
difference is that in the second case we can easily differentiate between the two
throws. If two indistinguishable coins are tossed simultaneously, there are just
three possible outcomes, {H, H}, {H, T}, and {T, T}. If one coin is tossed twice,
there are four distinct outcomes: HH, HT, TH, TT. Thus, depending on the nature
of the experiment, there are 3 or 4 outcomes, with the sample spaces
{H, H}, {H, T }, {T, T }
HH, HT, T H, T T
Indistinguishable coins
.
Distinguishable coins
There are six possible outcomes and the sample space consists of 6 elements:
{1, 2, 3, 4, 5, 6}. To each face is associated a probability p{i}; the dice is said to
be fair if p{i} = 61 .
A second experiment is rolling two dice. If the dice are distinct or if they are
rolled successively, there are 36 possible outcomes, i.e. the sample space is:
{11, 12, . . . , 16, 21, 22, . . . , 66} .
- 3.5 -
If they are indistinguishable, then some pairs of outcomes, like 12 and 21, become
one. There are 65
= 15 such pairs giving the total number of possible outcomes
2
as 36 15 = 21. In this case, the sample space is
{11, 12, . . . , 16, 22, 23, . . . , 66} .
When we throw two dice we are often interested not in the individual numbers
that show up, but in their sum. The sum of the two top numbers is an example
of a random variable, say (ab) = a + b (where a, b range from 1 through 6), that
takes values from the set {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, which is the sample space
of a single throw. If the dice are fair, the probability distribution on the sample
space is
p{2} = p{12} =
2
3
1
, p{3} = p{11} =
, p{4} = p{10} =
,
36
36
36
p{5} = p{9} =
3.2
4
5
6
, p{6} = p{8} =
, p{7} =
.
36
36
36
In this section we describe a set of statistical tests suitable for binary sequences.
The tests may be easily adapted to any random sequence over any alphabet, however the explanations for the binary case are more expedite, and more common
especially in cryptography and cryptanalysis of enciphered messages.
2 -test. This test is one of the most important in statistics for checking the fitting
of empirical distributions versus theoretical distributions. The problem is to test
whether a given set S = {x1 , x2 , . . . , xN } of samples come from a given probability distribution P = (p1 , p2 , . . . , pk ) over k categories {u1 , u2 , . . . , uk }, specifying
also some confidence intervals with relative probability.
Let Yi be the number of samples in S that belong to the category ui . If the probability distribution P is true, the expected value E[Yi ] is N pi so that the difference
Yi N pi is expected
to be small.
P
Thus the sum i (Yi N pi )2 should be small, and could be taken as a statistics,
however this sum is not very independent from the probability distribution P,
because the differences Yi N pi with pi small count like the differences with pi
large. It was found that a statistics that suffers less of this effect may be obtained
by weighting (Yi N pi )2 with N1pi . The resulting statistics U 2 may be evaluated
as
k
k
X
(Yi N pi )2
1 X (Yi )2
2
U =
=
N .
N pi
N i=1 pi
i=1
It can be shown that this statistics tends to the 2 statistics (of degree of freedom
= k 1 when N becomes large). Thus we may assume that the distribution of
- 3.6 -
y /21
ey/2 , y > 0 ,
/2
2 (/2)
where is the degree of freedom, i.e. the number ofRsquares of normal random
variables that enter in the sum 2 , [47], and (t) = 0 xt1 ex dx is the gamma
function [84].
p = 1% p = 5% p = 50% p = 95% p = 99%
= 1 0.00016 0.0039
0.4549
3.841
6.635
= 2 0.0201 0.1026
1.386
5.991
9.21
= 5 0.5543 1.1455
4.351
11.07
15.09
= 10 2.558
3.94
9.342
18.31
23.21
= 30 14.95
18.49
29.34
43.77
50.89
xp
-2.33
-1.64
0.00
1.64
2.33
Table I - Confidence intervals of the 2 distribution
The asymptotic expression for large of the upper extreme of the confidence
interval is
2
1
2
+ 2xp + x2p + O( )
3
3
1 2
10 20
16 14
3
4 5
18 22 16
18 12 18
6
14
16
We want to estimate at what extent each single dice is fair. The degree of freedom
is 6 1 = 5 in both cases. In the case of dice 1, the value of 2 is 5.6, therefore,
from Table I, we see that the value of 2 is neither too big nor too small, the value
5.6 is assumed some 60% of the times. Then dice 1 seems to be fair. In the case of
dice 2, the value of 2 is 2.24, a value that is assumed only some 20% of the times,
the experimental results are too close to the theoretical distribution, then dice 2 is
very probably not fair.
Zero-One counting. The test is based on counting the number of numbers 0s
and 1s. Let a block of N binary symbols be given
b1 , b2 , . . . , bN
bi {0, 1}
p
N
E[(n0 E[n0 ])2 ] =
.
2
Let 99% be a confidence (or probability) level; assume N = n0 + n1 , with N
100 to avoid inconsistencies, then with a probability of 0.99, both n0 and n1 are
included in the interval I99
"
#
N
N N
N
3
,
+3
.
2
2
2
2
E[n0 ] = E[n1 ] =
Test: Given a block of N binary symbols, count the total numbers n0 and n1 of
0s and 1s, respectively. The test of randomicity is passed, with 0.99%
confidence, if both n0 and n1 are included in I99 .
Group counting. This test is based on counting the numbers of the same patterns of bits, taken k by k. Since
b0 , b1 , . . . , bk1 of k bits can be interpreted
P a pattern
j
as an integer number mi = j bj 2 lying between 0 and 2k 1, the test is equivalently based on counting the number Ni of mi s for every i. Assuming that N is
sufficiently large to avoid inconsistencies, if the given sequence is a truly random
sequence, the numbers mi s are uniformly distributed, the expected number of
each of them is 2Nk . The standard deviation is the same for every statistics Ni
p
p
N (2k 1)
2
E[(Ni E[Ni ]) ] =
.
2k
The confidence interval is defined in exactly the same way as in the previous
case. Let 99% be a confidence (or probability) level, then with a probability of
0.99, every Ni is included in the interval I99
"
#
p
p
N (2k 1) N
N (2k 1)
N
3
, k +3
.
2k
2k
2
2k
Test: Given a block of N binary symbols, count the number Ni of patterns corresponding to the same number, mi , for every i. The test of randomicity is
passed, with 0.99% confidence, if every Ni is included in I99 .
- 3.8 -
N
X
Xi .
i=1
N
X
Xi Xi1 .
i=2
It is easily seen that the difference D = S R1 gives the number of runs of 1s:
in fact a run of length k of 1s is a pattern of the form 01 . . . 10, where we have k
consecutive 1s. Its R1 correlation is the pattern obtained as
0 1 1 ... ... 1 0 ...
... 0 1 1 ... 1 1 0
... 0 1 1 ... 1 0 0
in the third row the run of 1s has length k 1. It follows that the difference between the number of 1s in the original pattern and the number of 1s in the pattern
obtained by shifting and multiplication is exactly 1 for every run, thus D counts
exactly the number of runs of 1s. In a truly random sequence, the symbols Xi are
statistically independent and equiprobable. Thus the joint-probability distribution of the pair S and R1 is [45, p.11]
N
1
N S+1
S1
.
p{S, R1 } =
S R1
R1
2
Since we are interested only in the statistics D, we must sum over the values of S
and R1 whose difference is D, obtaining
N
1
N +1
p{D} =
.
2D
2
The average value of D can be computed directly, and turns out to be N8 , while its
variance is N16+1 .
Assuming a confidence level of 99%, the number of runs of 1s is included in the
interval I99
N
N +1 N
N +1
3
,
+3
.
8
4
8
4
- 3.9 -
Test: Given a block of N binary symbols, count the number S of runs of 1s and
the number R1 of runs of 1s, and obtain D = S R1 . The test of randomicity
is passed, with 0.99% confidence, if D is included in I99 .
The same test can be done with respect to the runs of 0s.
Up and down runs. Let us consider a sequence of 18 numbers
1
3 10 4 2 7 12 16 15 9 7 6 4 5 6 3 2 12
+ + + + + + + +
where, below each number the symbol + or denotes whether the number is
greater or smaller than the previous ones. We assume that adjacent numbers are
always different, or that the probability of the event that two adjacent numbers
are equal is zero. A sequence of N numbers has N 1 changes.
Assuming that the numbers are identically distributed, the number r of runs up
plus runs down (in the example above r = 7) is asymptotically distributed according to the normal distribution, with average (2N 1)/3 and variance (16N
29)/90.
We obtain a test on the hypothesis that the numbers in the sequence are uniformly
distributed, assuming the statistics r (i.e. the number of runs up and runs down)
is normally distributed with the given average and variance.
Test: Given a block consisting of N digits, the test of randomicity is passed, with
99.7% confidence, if the statistics r is included in the interval
r
r
2N 1
16N 29 2N 1
16N 29
3
,
+3
] .
[
3
90
3
90
Monte Carlo tests. The set of numerical techniques employing random numbers to evaluate integrals or functions, otherwise difficult to compute, bears the
name of Montecarlo methods.
The randomicity tests are defined considering the numerical evaluation by Montecarlo techniques of integrals whose value is exactly known. The discrepancy
between exact and Montecarlo evaluation of the integral is a measure of the goodness of random sequences.
The integral
Z 1
If =
f (x)dx ,
0
Given a putative random sequence of N numbers Xi [0, 1], we define the sum
N
1 X
f (Xi ) ,
fN =
N i=1
- 3.10 -
which represents an estimation of the value of the integral If , since taking the
expectation of fN , we have
N
1 X
E[fN ] =
E[f (Xi )] = E[f ] .
N i=1
E[f ]
1
2
1
12
sin 2ux
sin2 u
u
1 sin 2u
+
2
2u
D
,
N
i.e.
D
.
N
N
X
u(xi , yi ) ,
i=1
(3.1)
Given N samples X1 , X2 , . . . , XN of a random variable putatively with probability density f (x), and cumulative probability density F (x), let the samples be
re-ordered in an increasing way, i.e. Xi+1 Xi , then define the two statistics
N maxj [ Nj F (Xj )]
KN+ =
j = 1, 2, . . . , N
j1
KN =
N maxj [F (Xj ) N ] .
Note that the random variable = F () is uniformly distributed in the [0, 1]
interval, therefore the probability distribution of KN+ and KN can be deduced
referring to a random variable with uniform density in the [0, 1] interval.
The test consists of looking at the values KN+ and KN to determine whether they
are sufficiently high or low. These values should be compared with those given
- 3.12 -
in the following Table, borrowed from [47, page 48]. For example, the probability
yp2 =
1
1
ln
.
2 1p
Since in any reasonable test N is large, with probability p, KN+ or KN asymptotically are yp 61N or less.
p = 1% p = 5% p = 50% p = 95% p = 99%
n = 1 0.0100 0.0500
0.5000
0.9500
0.9900
n = 10 0.0291 0.1147
0.5426
1.1658
1.4440
n = 20 0.0381 0.1298
0.5547
1.1839
1.4698
n = 30 0.0435 0.1351
0.5605
1.1916
1.4801
Percentage points of the distributions of KN+ and KN
Although the test has been defined for continuous random variables, it can be
applied to test the uniform distribution of binary sequences. The binary stream
x1 , x2 , . . . , xN may be partitioned in blocks xk+1 , xk+2 , . . . , xk+k of k bits, and
each block may be considered as the positional representation of an integer base
2
k
X
X =
xk+i 2i1 .
i=1
The aim is to test the uniform distribution of these integers in the [0, 2k 1]
interval; equivalently, to look for the uniform distribution in the [0, 1] interval of
. Note that, in the case of uniform probability densities
the rational numbers 2Xkmu
1
in the [0, 1] interval, the cumulative probability density is F (x) = x.
Test: Given a sequence consisting of N = kL binary symbols; the sequence is
partitioned into L blocks of k symbols which are interpreted as integers,
then they are divided by 2k 1 and sorted into ascending magnitude. The
randomicity test is passed, with 0.97% confidence, if both KN+ and KN are
included into the interval
1
1
[1.22 , 1.52 ] ,
6 L
6 L
with L > 200.
3.2.1
The linear complexity profile, strictly speaking, is not a statistical test. However,
it can be seen as a Montecarlo method, because we compare the actual linear
complexity profile with the average (expected) linear complexity profile of a truly
random sequence.
Linear complexity is defined by referring to the linear sequences generated by the
Linear Feedback Shift Register (LFSR), [34, 65].
- 3.13 -
slope
1
2
`(n)
n
For each sub-sequence of length n, the approach is to compute, starting from
the beginning, its linear complexity, a process that yields the linear complexity
profile.
Since, if X is a genuine random sequence, then `(M ) = d M2 e on average, the linear
complexity profile is a straight line of slope 12 .
Let Xn be a subsequence of length n of an infinite truly random sequence X ,
and let ` = `(Xn ) denote the length of the shortest recurrence generating Xn . Let
g` (x) = x` + a1 x`1 + . . . + a`1 x + a` be the generator polynomial of Xn .
The linear complexity profile of Xn is defined as the function `(n; k) = `(Xk )
for every k from 0 to n. In order to compute the expectation of `(Xk ), for every
n, it is necessary to define a probability measure over {Xn } the set of all binary
sequences of length n. We say that a sequence Xn is randomly generated if it
- 3.14 -
is picked at random from {Xn } with probability p{Xn } = 21n , since |{Xn }| = 2n .
This definition is tantamount to considering a sequence Xn as produced bit by
bit, with bit probability 21 . Let c(n; k) denote the number of sequences in {Xn }
that are generated by a recurrence of order k; therefore the expectation of `(Xk )
can be written as
X 1
1 X
E[`(Xn )] =
`(X
)
=
kc(n, k)
n
2n
2n k
The last summation is easily computed, taking into account the following observations:
Every generator polynomial of degree k is allowed, [55], including xk which
is assumed to generate the all zero sequence.
The sequence 0 01 composed of n 1 zeros followed by a 1 is necessarily
generated by a recurrence of degree n, [55], therefore c(n; n) = 1 since any
other sequence of length n is generated by some LFSR of length less than n
[55].
c(n; 0) = 1 is a consequence of the previous observation.
c(1; 1) = 1: since we have only the sequence 1, given that the sequence
0 is generated by a recurrence of order 0, and c(n; 1) = 2, n > 1 since
the recurrence with generator polynomial x + 1 generates two sequences,
namely the all-zero and the all-one sequences, but by definition the all-zero
sequence is generated by a recurrence of order 0, and the sequence 01 cannot
be generated by a recurrence of order 1.
if n > 2k and k 0, then
c(n; k) = c(n 1; k) = c(2k; k),
because any periodic sequence generated by a LFSR of length k is specified
by its first 2k digits, and sequences longer than 2k are the periodic extensions of some sequence generated by a LFSR of length k.
c(2; 2) = 1 and accounts for the sequence 01.
c(3; 2) = 4 is obtained as the difference
c(3; 2) = 23 [c(3; 0) + c(3; 1) + c(3; 3)] = 4
Moreover, c(n; 2) = c(4; 2) = 8 for every n 4, where c(4; 2) is obtained by
direct counting, or repeating the same argument used above for evaluating
c(3; 2).
We have the recurrence c(2k; k) = 4c(2(k 1), k 1 because, adding one cell
to a LFSR, we have available one more initial condition and one more tap,
therefore
c(2k; k) = 22k1
k1
- 3.15 -
0
1
1
1
1
1
1
1
2
2
2
2
2
1
4
8
8
3
1
4
16
4 5
- - - 1 4 1
n
4
3n + 2 n
2 18
9
1 X
E[`(Xk )] = n
kc(n, k) =
n
2 k=0
5
3n + 2 n
2
2 18
9
even n
odd n
while the mean square deviation n2 , computed through the average of the squares
E[`(Xk )2 ], is
2
2
even n
n
81
81
81
1 X
2
n = n
kc(n, k) =
2 k=0
2
n
4
+
even n
X
2 18
1
kc(n, k) =
E[`(Xk )] = n
2 k=0
n
5
+
odd n .
2 18
Note that the square deviation of a linear complexity profile of a truly random
86
sequence of length n from the straight line is approximately n 81
.
Test: Given a block of N binary symbols, construct the linear complexity profile,
then compute the mean square distance D from the straight line of equation
86
y = x/2. If this value is close to 81
N , i.e. the difference D 86
N is O( 1N ),
81
the test is passed.
- 3.16 -
Chapter 4
Secret-Key Cryptography - Act I
Block ciphers
There is nothing makes a man suspect much,
more than to know little.
Francis Bacon
4.1
Introduction
4.2
The secret key controls the transformations whose aim is to conceal the information. Operatively the key used to encrypt is also used to decrypt; for this reason
these methods are also called symmetric key algorithms. The symmetry is perfect
in the case of binary stream ciphers, when the encryption operation is the XORing of plain and key streams. The design of these encryption mechanisms tries to
satisfy what is known as Kerckhoffs principle, according to which the security
(secrecy) of a system must lie entirely in the secrecy of the key, and not in the secrecy of the encryption algorithms. Two procedures have emerged, and are seen
as different implementations of the stream enciphering principle, namely block
ciphers, and stream ciphers proper.
Block cipher method: the stream of text symbols is partitioned into blocks of
fixed length, then a transformation is applied to each block of plain text to pro- 4.1 -
K (secret)
mi
E = f (M, K)
E
ei
Stream generator
Mi
Ki
Ei = f(Mi , Ki )
4.3
In this section we briefly describe some historically important secret-key encryption systems, which still drive todays evolution of many cryptographic techniques of practical importance.
During its secular evolution, many variants of the Caesar cipher have been proposed. Although all methods have the same basic principle, they have been differently denominated, to indicate more explicitly the transformation they perform.
4.3.1
Substitution encryption
Substitution encryption is an enciphering rule such that the letters are substituted
with other letters of the same alphabet. The substitution is called monoalphabetic
when the same letter is always substituted with the same letter of the same alphabet, whatever the position of the letter in the message.
If the same letter in different positions is substituted with different letters, the
encryption scheme is called polyalphabetic.
An example of monoalphabetic substitution is Caesers cipher. A second example is offered by the simplest use of Albertis encryption disc.
Monoalphabetic substitution is very weak against statistical attacks based on the
frequencies of the letters in texts of a given language. Poly-alphabetic substitution is more resistant to statistical attack; nevertheless, if the text is sufficiently
long it can be broken, as will be explained in Chapter 10.
4.3.2
Transposition encryption
This is an encryption rule that changes the relative position of the letters without
changing the letters themselves. The main feature of this mechanism is that it is
resistant to statistical attacks based on letter frequencies.
- 4.3 -
4.3.3
Albertis disk
Z
1
&
n
C
p r t u
A
B
g
k
c e
y s o
m
i h f d b
4.3.4
Vigenere cipher
Vigenere encryption had a relatively long history, being first devised by Italian
cryptographers of the Renaissance. However, Vigeneres description in his Traite
de ciffres, gave it his name, although he did not have priority in the invention of
the method.
The method, in its more elementary formulation, consists of a block of k Caesar
ciphers, each driven by a Caesar key (i.e. an alphabet letter). These k letters
constitute the secret key. Vigeneres original description is the following table,
where in each line, the letters of the alphabet are in the normal order, but begin
with a specific letter, together making up a Caesar key: in this example, ELENA
shifted by the corresponding initial letter, the Caesar key:
EF GHIJKLM N OP QRST U V W XY ZABCD
LM N OP QRST U V W XY ZABCDEF GHIJK
EF GHIJKLM N OP QRST U V W XY ZABCD
N OP QRST U V W XY ZABCDEF GHIJKLM
ABCDEF GHIJKLM N OP QRST U V W XY Z
For example the Caesars statement alea iacta est would have been encrypted as
EW IN IEN ERSX .
If the message is sufficiently long, and the length of the key is known, a statistical
attack based on the frequency of the letters in a language allows us to break the
Vigenere system systematically. The method will be described in Chapter 10.
4.3.5
Hill Cipher
In [38], Lester Hill proposed the first practical block cipher operating on more
than two symbols. The letters of a plain text are firstly encoded in numbers of
Zm ; a possible m is 31, which leaves number for special characters like commas,
full stops, and blanks. Then a block of n consecutive numbers is considered as
a column vector v and transformed using an invertible matrix H (the Hill matrix) over Zm , producing a vector e = Hv of encrypted numbers, which may be
eventually encoded back to letters. For example, the Latin message
veni,vidi,vici
is first encoded as
21 04 13 13 08 26 21 08 03 08 26 21 08 02 08 ,
where each number corresponds to the position of the letter in the natural alphabetical order, and 26 stands for a comma. Then the following Hill matrix of
determinant 1 in Z31 is used
1 7 5
H = 2 14 11
18 8 11
- 4.5 -
4.3.6
Sir Francis Bacon, an English statesman, is also known for his formulation of a set
of principles of good encryption systems [53]. He also proposed an encryption
scheme that anticipates many modern encryption techniques. Bacons cipher is a
double substitution cipher, the 26 letters of the alphabet are first encoded, using
blocks of five binary symbols, i.e. 0 and 1, then the 26 letters are partitioned into
two sets A0 and A1 , thus each 0 is substituted with any letter of the set A0 , and
each 1 is substituted with any letter of the set A1 . Note that the encrypted message
is expanded by a factor 5. For example, the message BACON is first encoded into
binary symbols, where each letter is substituted by the positional binary number
of five bits corresponding to the numbers from 0 to 25, with 00000 corresponding
to A, 00001 corresponding to B, and so on. We partition the 26 letters of the alphabet in two sets, namely A0 = {a, b, . . . m}, and A0 = {n, o, . . . z}; this partition
is actually the secret key. The encryption is described by the following table
B
A
C
O
N
00001 00000 00010 01110 01101
agbjn bkhjc bamzf kpyxa dntlw
Decryption can be performed by substituting the letters with the numbers and
then re-interpreting the binary patterns as letters. The secrecy of the partition A0
and A1 is mandatory for the security of the system. However, without further
transformation, a statistical attack is possible, as will be explained in Chapter 10.
4.3.7
One-time pad
The encryption system known as one-time pad was mainly used by spies: people
who caught our fantasy, and are the protagonists of spy novels, in which names
like Mata Hari, Michele Strogoff, or James Bond became famous. Encryption
mechanisms impossible to break were a central part of the novel The key to Rebecca by Ken Follett (1980).
- 4.6 -
The idea underlying one-time-pad encryption was first described by Frank Miller
in 1882, and rediscovered after Gilbert Vernan introduced a binary encryption
system for telegraph messages using a punched tape, and Joseph Maubourgne
improved it by requiring that the tape should only be used once. The spy one
time pad was a book (pad) used only once. Each character of the plain text is encrypted by modular addition with the character taken from the pad and chosen
according to a specific rule in different positions of different pages. In practical
terms, this system achieves perfect Shannon secrecy.
4.3.8
Enigma
The Enigma encryption machine has only historical interest; nevertheless, it represents an example of combinations of many genial ideas that overcame the technological limits at the beginning of the 20th century, when electronic and mechanical technologies had not yet the present-day levels of flexibility and reliability.
Figure 4.4 reports a scheme of principle of the machine:
when a key corresponding to a message letter is pressed, for example a, an electrical circuit is closed and a letter on a screen lights up, showing the encrypted
letter for a. In the figure a current path is shown which induces the lighting of
letter C.
From Figure 4.4 we may deduce that the letter a will never be enciphered as the
letter A, and this occurs for 26 pairs of letters as a consequence of the machines
structure. This fact could be a source of weaknesses that could facilitate attacks;
for this reason, an initial permutation was introduced, which was part of the secret key and changed periodically. Therefore, to an attacker it was not possible to
know the pairs plain letter- encrypted letter that should be excluded.
4.4
Block ciphers
4.4.1
The most widely used block cipher schemes, namely governmental DES and AES,
or proprietary IDEA, have the same basic structure, shown in Figure 4.5, which
was introduced by the famous enciphering machine Enigma. The size of the input buffer is equal to the dimension of the block of symbols to be encrypted at
each stage.
The following box shows a permutation on input symbols; the inverse of this
- 4.7 -
A
a
HH
c s
B
HH
s
c
C
HH
s
c
A
A
A
B
B
B
HH
s
c
4.4.2
Block
key
# rounds
1 char.
3 char.
32
32 + 32 bits
56 bits
16
8 (4 4) bits 128-192-256 bits 10 12 14
Modes
The original conception of DES was for encrypted blocks of symbols to be transmitted, and then transformed independently of one another. The designed data
transform was not dissimilar to a one-way function. Then, once a one-way function was practically available, it was possible to use such a function in different
modes. In particular, five modes have been conceived:
Electronic Code Book (ECB). This mode specifies the behavior of a block cipher;
block of symbols, usually bits, are transformed into blocks of symbols of
the same length. The transformation is a one-way function that cannot be
- 4.8 -
Input buffer
Input permutation P
Round block 1
..
.
Round block n
Output permutation P 1
Output buffer
K (secret)
mi
E = f (M, K)
E
ei
K (secret)
V = f (V0 , K)
M
V
ri
K (secret)
mi L
6
E = f (M, K)
E
ei
?
- 4.10 -
K (secret)
mi L
6
E = f (M, K)
E
ei
?
4.5
DES
DES was the result of a project of the NBS, now NIST, with the aim of defining a
standard suitable for application to cryptography in civil life, business, and commerce. The motivations for standard recommended by governmental institution
were multiple:
- To encourage the application of cryptographic protections to communication systems need for non-military applications.
- 4.11 -
K (secret)
E = f (M, K)
mi
-
ei
L
?
In its typical application, DES is and acts as a block cipher, where each block of
64 bits of plain text is encrypted always using the same secret key of 56 bits.
In the 1980s, two conferences lead by NBS discussed the mathematical aspects of
DES and the possibility of attacks based on the new emerging technologies. The
relevant conclusions were:
- DES could be accepted for civil applications, in view of the original design
and targets;
- the limited length of the blocks, and in particular of the key, was sufficient at
that time to prevent exhaustive attacks. However, technological evolution,
increasing computational power would induces changes to DES within a
decade (prediction that was verified in about 2000, with the standardization
of Rijndael algorithm).
The DES algorithm transforms a sequence of binary symbols into a binary sequence of the same length. In this contest, the standard byte representation of
the information is immediately interpreted as blocks of 8 bits. DES performs the
following transformation:
A block of 64 bits is changed by an almost involutory transformation
(in the sense that the inverse transformation requires the inversion of
the round order) specified by the key of 56 bits, into a block of 64
bits. Since the map is invertible, every transformation identified by
the key of 56 bits induces a permutation (which is a product of disjoint
transpositions) on the set of 264 blocks of 64 bits.
The set of these permutations generates a subgroup K of the symmetric group Sn
of permutations over n objects, with n = 264 . Comments on the order of K, which
is not less than 256 , are deferred to Chapter 10.
4.5.1
DES transformations
Li
Ri
"
"
"
b
b
"
b
b
"
"
b""
"bb
"
"
"
"
Q(S(E(Ri ) + Ki+1 ))
b
b
b
b
b
"
"
Li+1
Ri+1
5 6 7 8 9 10 11 12 13 14 15 16
26 18 10 2 60 52 44 36 28 20 12 4
21 22 23 24 25 26 27 28 29 30 31 32
30 22 14 6 64 56 48 40 32 24 16 8
37 38 39 40 41 42 43 44 45 46 47 48
25 17 9 1 59 51 43 35 27 19 11 3
53 54 55 56 57 58 59 60 61 62 63 64
29 21 13 5 63 55 47 39 31 23 15 7
- E(.) is a function that expands the 32 bits of Rj1 to 48 bits that are summed
(modulo 2) with the 48 bits of the local key Kj produced starting from k,
and following a procedure described below.
- S(.) is a function that compresses the word E(Rj1 +Kj ) of 48 bits to a word
of 32 bits. The function S(.) considt of 8 S-box working in parallel: it has
been the most criticized function of DES.
From the vector
U = (x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 , x12 , x13 , x14 , x15 , x16 , x17 ,
x18 , x19 , x20 , x21 , x22 , x23 , x24 , x25 , x26 , x27 , x28 , x29 , x30 , x31 , x32 ) ,
the expansion function E generates the vector
E(Rj ) = (x32 , x1 , x2 , x3 , x4 , x5 , x4 , x5 , x6 , x7 , x8 , x9 , x8 , x9 , x10 , x11 , x12 ,
x13 , x12 , x14 , x15 , x16 , x17 , x17 , x18 , x19 , x20 , x21 , x20 , x21 , x22 , x23 ,
x24 , x25 , x24 , x25 , x26 , x27 , x28 , x29 , x28 , x29 , x30 , x31 , x32 , x1 )
The permutation Q is defined as follows
1 2 3 4 5 6 7 8
16 7 20 21 29 12 28 17
17 18 19 20 21 22 23 24
2 8 24 14 32 27 3 9
9 10 11 12 13 14 15 16
1 15 23 26 5 18 31 10
25 26 27 28 29 30 31 32
19 13 30 6 22 11 4 25
The 8 S-boxes perform a compression from 48 bits to 32 bits. The 48 bits are
partitioned in 8 blocks of 6 bits each. Thus the first four bits of each block are
used as a column address, and the last two bits are used as row addresses of a
4 16 matrix, Tables D.1 and D.2, of the corresponding S-box, whose elements
are integer numbers in the range [0 , 15], that is number that correspond to 4 bits.
The output consists of 8 blocks of 4 bits, for a total of 32 bits.
4.5.2
The secret key of 56 bits is used to generate the 16 local keys, each of 48 bits,
according to the following scheme
- The key k of 56 bits is expanded to 64 bits by adding parity bits (or possibly
dummy bit) in position 8, 16, . . . , 64.
- The 64 bits of the expanded key are permuted by P C1, and contemporarily,
the parity check bits are discarded, obtaining a block of 56 bits which is
partitioned into two blocks of 28 bits, C0 and D0 .
- The blocks Cj and Dj are transformed by cyclic shifts according to the sequence
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 1 2 2 2 2 2 2 1 2 2 2 2 2 2 1
- 4.15 -
0 1 2
14 4 13
0 15 7
4 1 14
15 12 8
3 4 5 6 7 8 9 10 11 12 13
1 2 15 11 8 3 10 6 12 5 9
4 14 2 13 1 10 6 12 11 9 5
8 13 6 2 11 15 12 9 7 3 10
2 4 9 1 7 5 11 3 14 10 0
14 15
0 7
3 8
5 0
6 13
0
1
2
3
15 1
3 13
0 14
13 8
8
4
7
10
14
7
11
1
6
15
10
3
11
2
4
15
3
8
13
4
4
14
1
2
9
12
5
11
7
0
8
6
2
1
12
7
13
10
6
12
12
6
9
0
0
9
3
5
5
11
2
14
10
5
15
9
0
1
2
3
10 0
13 7
13 6
1 10
9
0
4
13
14
9
9
0
6
3
8
6
3
4
15
9
15
6
3
8
5
10
0
7
1
2
11
4
13
8
1
15
12
5
1
14
7
14
12
3
11
12
5
11
4
11
10
5
2
15
14
2
8
1
7
12
0
1
2
3
key
0
1
2
3
7 13
13 8
10 6
3 15
14
11
9
0
3
5
0
6
0
6
12
10
6
15
11
1
9
0
7
13
10
3
13
8
1
4
15
9
2
7
1
4
8
2
3
5
5
12
14
11
11
1
5
12
12
10
2
7
4
14
8
2
15
9
4
14
2 12
14 11
4 2
11 8
4
2
1
12
1
12
11
7
7
4
10
1
10
7
13
14
11
13
7
2
6
1
8
13
8
5
15
6
5
0
9
15
3
15
12
2
15
10
5
9
13
3
6
10
0
9
3
4
14
8
0
5
9
6
14
3
0
1
2
3
12 1
10 15
9 14
4 3
10
4
15
2
15
2
5
12
9
7
2
9
2
12
8
5
6
9
12
15
8
5
3
10
0
6
7
11
13
1
0
14
3
13
4
1
4
14
10
7
14
0
1
6
7
11
13
0
5
3
11
8
11
8
6
13
0
1
2
3
4 11
13 0
1 4
6 11
2
11
11
13
14
7
13
8
15
4
12
1
0
9
3
4
8
1
7
10
13
10
14
7
3
14
10
9
12
3
15
5
9
5
6
0
7
12
8
15
5
2
0
14
10
15
5
2
6
8
9
3
1
6
2
12
0
1
2
3
13 2
1 15
7 11
2 1
8
13
4
14
4
8
1
7
6
10
9
4
15
3
12
10
11
7
14
8
1
4
2
13
10
12
0
15
9
5
6
12
3
6
10
9
14
11
13
0
5
0
15
3
0
14
3
5
12
9
5
6
7
2
8
11
- 4.16 -
From the two blocks considered jointly (Cj | Dj ) as a vector of 56 bits, a key
Kj of 48 bits is extracted by means of the table
14 17 11 24 1 5 3 28 15 6 21 10 23 19 12 4
26 8 16 7 27 20 13 2 41 52 31 37 47 55 30 40
51 45 33 48 44 49 39 56 34 53 46 42 50 36 29 32
The permutation P C1 is defined by the following table
57 49 41 33 25 17 9 1 58 50 42 34 26 18 10 2
59 51 43 35 27 19 11 3 60 52 44 36 63 55 47 39
31 23 15 7 62 54 46 38 30 22 14 6 61 53 45 37
29 21 13 5 28 20 12 4
4.6
AES
AES is the acronym for Advanced Encryption Standard, a block cipher algorithm
certified by NIST,the American National Institute of Standards and Technology.
The algorithm at the basis of AES is a specialization of the Rijndael algorithm
that was developed by the Belgian researchers Vincent Rijmen and Joan Daemen
to answer an international competition, launched by NIST in the mid-1990s, with
the aim of selecting a valid substitute for the glorious DES, now outdated due to
advances in electronic and computing, after twenty years service.
A major limitation of DES was the small and fixed key length of 56 bits, which
was vulnerable to exhaustive attacks that could be set up, with the ever increasing
power of computers. The Rijndael algorithm was conceived to perform a block
enciphering with different block lengths, to be chosen from among 128, 192, and
256 bits, and independently different key lengths to be chosen from between 160
and 224 bits.
Standard AES, like DES, performs block ciphering; further like DES, but with
greatest flexibility, it also enables cryptographic functions to be performed, such
as generation of random numbers, generation of streams either synchronous or
auto-synchronizing, HASH functions, and operations at the MAC level.
According to the authors, the design criteria followed in the project of Rijndael
were
Strength against any kind of known attacks;
Fast and compact codes (programs) over a large class of platforms;
Easy to design and to implement.
AES, like DES, performs a sequence of rounds, that is, a sequence of transformations (like Enigma) over a block of bits. Each round consists of four distinct
transformations, called layers, that perform a passage from the initial State to a
Next state
- 4.17 -
X = St(i)
St(i + 1) = X
b7 x + b6 x + b5 x + b 4 x + b3 x + b2 x + b1 x + b0 =
7
X
bi x i
i=0
Arrays of bytes: input, output, and key bits for processing are considered as sequences of 16 bytes (or longer, as required by the length of the key)
a0 , a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , a9 , a10 , a11 , a12 , a13 , a14 , a15
The input bits inputi are grouped 8 8 as follows
a0 = {input0 , input1 , input2 , input3 , input4 , input5 , input6 , input7 } .
The bits in each byte are then rewritten in reverse order, as shown in the
third row in the following table, where the second row shows the number
of the bytes
Input bits
Byte number
Bits in a byte
3
0
5 4
7 8
10
11
0 7
12
1
3
13
14
15 . . .
...
0 ...
State Matrix. St represents the internal state of the machine. This matrix St contains the partially transformed block of bits, that is the output of a round
which is input of the next round. St is a 4 Nb array of bytes, where Nb is
equal to the input block length divided by 32 (Nb is either 4, 6, or 8). Each
byte in the state is denoted as Sij , with 0 i 3 and 0 j Nb 1. The
initial state is loaded with the bytes inputm 0 m 15 of the input block of
16 bytes (or 24, or 32 bytes), according to the rule
Sij = inputm
where i = m mod 4, j =
mi
4
Nr
10
12
14
4.6.1
Round Transformations
Subbyte operates a nonlinear transformation over a byte a considered as an element of F28 (the transformation is a simple inversion), followed by an affine
transformation considering the image byte b as a vector of 8 bits according
to the rule Subbyte(b) = Ab + a, that is
0
b0
1 0 0 0 1 1 1 1
b0
1
b01 1 1 0 0 0 1 1 1 b1 1
0
b2 1 1 1 0 0 0 1 1 b2 0
0
b3 1 1 1 1 0 0 0 1 b3 0
0 =
b4 1 1 1 1 1 0 0 0 b4 + 0 ,
0
b 0 1 1 1 1 1 0 0 b5 1
50
b6 0 0 1 1 1 1 1 0 b6 1
b07
0 0 0 1 1 1 1 1
b7
0
where matrix A and vector a are specified in the Standard.
Shiftrow transformation operates in the following way on the rows of St
the first row is not changed
the following rows are shifted by a value that depends on their index
Sij Si,j+i
where the sum index is considered modulo 4.
Mixcolumn is a transformation operating independently on each column of St.
Each column is written as a polynomial cj (x) of degree 3 with coefficients
in F28 , then cj (x) is multiplied by a fixed polynomial a(x) of degree 3 over
the same field, and relatively prime with x4 + 1, i.e.
a(x) = {03}x3 + {01}x2 + {01}x + {02}
modulo the polynomial x4 + 1.
AddRoundKey is the operation of adding a round key, working by columns, to
the matrix St, according to the rule
s0c = sc + w4r+c ,
where r denotes the round number.
4.6.2
The cipher key K is expanded to generate the local keys that are used in each
round. The key expansion generates a total of 4(Nr + 1) words: the algorithm
needs an initial set of 4 words, and each of Nr rounds requires 4 words of key
data. The resulting key system consists of a sequence of matrices, of words of 4
- 4.20 -
bytes, and denoted [wi ], with 0 i < 4(Nr + 1). Initially [wi ] is loaded with the
key columns as
[wi ] = (key[4 i], key[4 i + 1], key[4 i + 2], key[4 i + 3]) .
then each sub-matrix wi is up-dated according to the rule
wi Subword(Rotword(wi )) + Rcon[i/4] ,
where the functions Subword and Rotword are the same used in the rounds.
In key expansion, the round key is a word wi formed by 4 bytes and generated
from the key K (assumed of 128 bits) in the following way
w0 , w1 , w2 , and w3 are taken directly from the key K;
the following words are recursively generated in the following way
wi = vi + wi4
where the sum is done bit-by-bit modulo 2 (XOR), and
if 4 /| i
wi1
v(i) =
- 4.21 -
Chapter 5
Secret-Key Cryptography - Act II
Stream ciphers
If one has really technically penetrated a subject, things that previously seemed in complete contrast, might be purely mathematical
transformations of each other.
John von Neumann
5.1
Introduction
Symbol-by-symbol encryption is attractive, because of the simple and fast enciphering rule, but it shifts security problems from the combining function to
stream generation. The most common stream encryption rule, that is XORing the
binary information stream with a truly random binary stream, was introduced
by Gilbert Vernan in 1919 along with the notion of one-time-tape (really this last
concept is due to Joseph Maubourgne). However, in practical situations truly
random streams are not available, or are managed with great difficulty: they are
difficult to generate, difficult to exchange on secure channels, and difficult to destroy after use. Since random stream generation has assumed a prominent role,
a large number of structures have been proposed for producing sequences that
look random, including the use of block ciphers in cipher-feedback mode (CFB).
However, in the following only structures, based on LFSR, will be dealt with, because of their flexibility and speed. The general structure and features of stream
cipher generators are described as special finite state machines. Due to its key
role, the mathematics of LFSR is recalled at some length. Typically, the encryption systems have rate 1, that is the length of the encrypted message is equal to
the length of the plain message. However, a section of this chapter will be devoted to describing an interesting stream enciphering scheme with rate less than
1.
- 5.1 -
Periodic FSM
Output function
K
? i
Mi
-
Ei = f(Mi , Ki )
5.1.1
The structure
5.1.2
Linear feedback shift registers are particular Finite State Machines (FSM) whose
evolution is governed by linear functions.
Definition 5.1. An autonomous Finite State Machine (FSM) is an object which is described by a quintuple of parameters
{S, A, So , stat, out}
where
1. S is a finite set of states that characterize the machine; it is usually a set of mdimensional binary vectors u which are called states;
2. A is a finite set of symbols, the output symbols
3. So S, is a fixed state, the initial state
4. stat is a mapping from S into S, the next state function; it describes the transition
from one state to another state during the machines evolution.
5. out is a mapping from S into A, and specifies the output symbol when the machine
is in a given state; it is the output function.
The evolution of an FSM is described by a sequence of states
S : u(0), u(1), . . . u(n), . . . ,
where u(n) = stat[u(n 1)] and the output sequence X : x(0), x(1), . . . , x(n), . . .
is generated as x(n) = out(u(n)).
In autonomous evolution, the sequence S is periodic with period the minimum
integer T such that u(n + T ) = u(n) for every n, and the output sequence X has a
period N , which is possibly a submultiple of T . If N = T the function out is said
to be non-degenerate.
5.2
proposition, thus establishing a Boolean algebra. Let Prop.1 stand for Proposition
1, and let X denote its logical value (true or false), then we have
Prop.1 AN D Prop.2
Prop.1 OR Prop.2
N OT Prop.1
Prop.1 XOR Prop.2
Boole logic
X Y
X Y
X
W
(X Y ) (X Y )
Boolean algebra
xy
x+y+xy
.
1+x
x+y
F2
Definition 5.2. A Boolean function is a function of Boolean variables and takes binary
values in Boolean algebra.
Equivalently, a Boolean function is defined as a function of binary variables taking values in the Galois field F2 .
Given m binary variables v1 , v2 , . . . , vm , a product of s distinct variables is a monomial of degree s
vj1 vj2 . . . , vjs ji {1, 2, . . . , m} .
Binary variable are considered only at first degree (that is, they satisfy
a formal
m
identity x2 = x); the number of distinct monomials of degree s is
.
s
Boolean functions admit normal forms that are useful for their classification. Two
normal forms, called disjunctive and conjunctive normal forms, can be described
in Boolean algebra; a third normal form, called algebraic normal form, is conveniently described in terms of binary variables over F2 .
Theorem 5.1 (Disjunctive normal form). Every Boolean function of m variables can
be expressed as a union of monomials of the Boolean variables and their negated.
Theorem 5.2 (Algebraic normal form). Every Boolean function of m variables can be
m
expressed as a sum of monomials of binary variables. The number of functions is 22 .
Proof. Let f (v1 , . . . , vm ) be a Boolean function of m variables. The theorem is a
consequence of the following recursive relation, which is self-explaining:
f (v1 , . . . , vm ) = vm f (v1 , . . . , 1) + (1 + vm )f (v1 , . . . , 0)
.
= vm [f (v1 , . . . , 1) + f (v1 , . . . , 0)] + f (v1 , . . . , 0)
i
0
1
2
3
4
5
6
7
fi
0
1
0
1
0
1
1
0
5.3
Inversion resistance is mainly deferred to properties of the output functions. However, it may also be reinforced by LFSR sequences properly modified, as we will
see later in this Chapter.
The structure of a binary LFSR is described by a matrix that also governs its evolution. Let k be the LFSR length, and let x(n) be a k-dimensional vector defining
the LFSR state at step n. The evolution of a binary LFSR is described by a system
of linear recurrent equations of the first order over F2 , which, in matrix notation,
is
(5.1)
x(n + 1) = Mx(n) ,
?
m
Galois LFSR
?
m
Fibonacci LFSR
- 5.6 -
?
m-
?
m
?
m
?
m
Tridiagonal LFSR
Given a binary polynomial of degree k over F2
p(z) = z k + a1 z k1 + a2 z k2 + + ak z + ak ;
the following matrix is known as its companion matrix
0 0 0 0 . . . 0 ak
1 0 0 0 . . . 0 ak1
0 1 0 0 . . . 0 ak2
.
. .
.
C = .. 0 1 . . 0 .. ..
0 . . . 0 ... 0 0 a
3
0 ... 0
1 0 a
2
0 ...
1 a1
and describes the structure of a Galois LFSR, while its transpose C T describes the
structure of a Fibonacci LFSR. The structure of a tridiagonal LFSR is described by
a binary tridiagonal matrix of the form
d0 1 0
...
0
1 d1 1
...
0
... ...
T = ...
,
0 . . . 1 dk2
1
0 0 ...
1
dk1
which is uniquely identified by the vector dt = (d0 , d1 , . . . , dk1 ). In Appendix
III (i.e. Section 5.9) of this chapter it will be shown how to compute dt from the
characteristic polynomial p(z).
5.3.1
aj z j ,
j=0
- 5.7 -
xj (n)
xk (n)
(5.2)
linear system.
The solutions of (5.2) and (5.3) can be obtained from the characteristic equation
of matrix A, that is from its eigenvalues, or by means of a generating function
technique.
Any matrix A satisfies its characteristic polynomial pA (z), that is pA (A) = O.
The minimal polynomial mA (z) of a matrix A is the polynomial of lowest degree
such that mA (A) = O. The polynomial mA (z) is a divisor of the characteristic
polynomial of pA (z). Two square matrices A and B are similar, possibly in some
extension field of their coefficient fields, if a nonsingular matrix S exists such that
B = SAS1 . Similar matrices have the same characteristics and minimal polynomials.
Example - Let us show the state evolution of three LFSR with the same generator
polynomial, different implementations, and same initial state. Consider LFSRs of
length 4 with generating polynomial g(z) = z 4 + z + 1 and initial state (0, 1, 1, 0).
The three sequences of states for Fibonacci, Galois, and Tridiagonal LFSRs, respectively, are
0
0
1
0
0
0
1
1
1
1
0
1
0
1
1
1
0
0
1
0
0
0
1
1
1
1
0
1
0
1
1
1
0
0
1
0
0
0
1
1
1
1
0
1
0
0
1
1
0
0
1
0
0
0
1
1
1
1
0
1
0 1 1 0
0
0
1
0
0
0
1
1
1
1
0
1
0
1
1
1
0
0
1
0
0
0
1
1
1
1
0
1
0
1
1
1
0
0
1
0
0
0
1
1
1
1
0
1
0
0
1
0
0
0
1
1
1
1
0
1
0
1
1
0
0 1 1 0
0
1
0
1
1
1
1
0
0
0
1
0
0
1
1
1
1
1
0
0
0
1
0
0
1
1
0
1
0
1
1
0
0
1
1
0
1
0
1
1
1
1
0
0
0
0
1
0
0
1
1
0
1
0
1
1
1
1
0
0
0 1 1 0
The period is maximum, that is T (4) = 15, and thus does not depend on the initial
state, obviously with the exclusion of the all-zero state.
5.4
The interplay between binary sequences and linear codes is of great importance
in cryptography. In particular, a fruitful trick for obtaining good binary (nonlinear) sequences, suitable for cryptographic applications, is code puncturing
of linear codes [65]: the operation of puncturing simply excludes (punctures) some
- 5.9 -
entries of the code vector in the same positions of every code word. In general,
puncturing a linear code (n, k, d) produces a shorter code (n0 , k, d0 ) with the same
dimension, and a smaller minimum distance, i.e. n0 < n and d0 < d. However, if
length and minimum distance reduction are not exceedingly large, the resulting
code words/ sequences are good for cryptography.
It is well known that a block [x(0), x(1), . . . , x(T 1)] of T = 2m 1 consecutive binary symbols in a sequence X produced by an LFSR with primitive generator polynomial g(x) of degree m is a code word of a dual Hamming code
(2m 1, m, 2m1 ) [54]. A code word is generated, symbol by symbol, during the
evolution of the LFSR, where x(n) = xTo u(n) is a linear function of the LFSR state
u(n), and xo being a fixed vector.
Let be a root of g(x), then B = {1, , . . . , m1 } is a basis of F2m .
Let B = {1 , 2 , . . . , m } be the dual basis of B, that is a basis that satisfies the
conditions
1 if i = j 1
i
Tr( j ) =
,
0 if i 6= j 1
where Tr() is the trace function mapping F2m onto F2 defined as
m1
Tr() = + 2 + + 2
5.4.1
BCH codes
(2)
(r)
r
X
n(2j12r)
)=
j=1
r
X
Tr(j n(2j12r) )
j=1
5.4.2
Goppa codes
Goppa codes (2m 1, mt, d) are also closely connected to the generalized ReedSolomon codes, and can be generated, like the BCH codes, using the trace function [54, Theorem 5, p.341].
Let G(z) be an irreducible polynomial (the Goppa polynomial) with coefficients
m
in the Galois field F2m , and let L = {1 , 2 , . . . , 2m 1 } F22m 1 be the ordered
location set, with G() 6= 0 L. Let be a primitive element of F2m , then a
dual of a Goppa code has words
m
f ()
f (2 )
f (2 2 )
f (1)
, Tr(
), Tr(
), . . . , Tr(
) .
Tr
G(1)
G()
G(2 )
G(2m 2 )
A code word of C can be viewed as a period of a binary sequence. The linear
complexity, given in a theorem below, is a direct consequence of the following
lemma.
Lemma 5.3. Let G(z) be a polynomial of degree r over F2 that is irreducible over F2m ,
and let F2m be a primitive element. The linear complexity L(g) of the sequence
2m 1
g = { G(1 n ) }
n=0 is lower bounded by m .
Proof. The linear complexity of g is given by the smallest M that allows us to
write
M
X
1
=
nki n = 0 . . . 2m 2
(5.4)
G(n )
i=1
1
for a suitable choice of the set k1 , k2 , . . . kM Z. Let us observe that
=
j
G(n1 2 )
2j
1
for every j less than m, therefore every set of m integers n = n1 2j , j =
G(n1 )
1
0, . . . , m 1 results in a single condition; furthermore, G(
0 ) = 1 implies that M is
2m 1
odd. Therefore equation (5.4) can be seen as a system of m syndrome equations
m
m
that must correct 2 m1 errors in positions ki , thus M 2 m1 .
Theorem 5.5. Let f (z) be a polynomial of degree r 1 with every coefficient j F2m
m
not zero. The limited sequence s = {zn = Tr(f (n )/G(n ))}2n=02 generated by the trace
m
function has a linear complexity L(s) that is lower bounded by 2 m1 .
Proof. Since the conditions of Lemma 5.3 hold, the linear complexity of s is not
less than the linear complexity of g.
- 5.12 -
5.5
5.5.1
Clock-controlled LFSR
Let A and B be two m m binary matrices. The simplest, although fairly general,
form of the state transition function s of a clock controlled binary LFSR is
u(n) = [(1 c(n))A + c(n)B]u(n 1) ,
where the binary sequence c = c(0), c(1), . . . , c(n), . . . is called the control sequence. The sequence c may be generated by an LFSR, or may be extracted from
the sequence u(n) itself; in this case the machine evolution is autonomous and is
called self-clock-controlled. For example, if c(n) = f (u(n 1)), then the feedback
shift register can be viewed as a nonlinear device.
The state vector of a clock-controlled LFSR can be expressed at step n as a function of the initial state
n
Y
u(n) =
[(1 c(n))A + c(n)B]u(0) .
i=1
From now on we will assume that A and B are commuting binary matrices, therefore we have
u(n) = AC0 (n) BC1 (n) u(0) ,
(5.5)
where C0 (n) and C1 (n) denote the number of 0s and 1s in the sequence c up to
step n.
The period of the sequence S is the minimum integer N such that u(N +n) = u(n)
for every n. Therefore equation (5.5) implies
AC0 (N +n)C0 (n) BC1 (N +n)C1 (n) = I
and
for any n. In the following we will assume that the initial state u(0) belongs to a
periodic cycle, so that we can take n = 0 and write C0 + C1 = N = C0 (N ) + C1 (N ).
- 5.13 -
C0 = K1 2
and C1 = K1 22
C0 + `B C1 = K(2m 1)
where, K and K1 are the smallest integers that satisfy the condition
K1 [(`B + 1)2M 1 1] = K(2m 1)
If (`B + 1)2M 1 1 and (2m 1) are relatively prime, then K1 = (2m 1) , and the
period is N = (2M 1)(2m 1).
The situation is completely different for the self-clock-controlled LFSR.
5.5.2
Self-Clock-controlled LFSR
Let c be produced by the relation c(n) = f1 (u(n 1)) with f1 () being a function
of the state vector u. Letting B = A`B , then equation
u(n) = [(1 f1 (u(n 1)))A + f1 (u(n 1))B]u(n 1)
shows that a self-clock-controlled LFSR is equivalent to a non-linear feedback
shift register, with possibly a shorter period.
Assuming again that the initial state belongs to a periodic cycle, under the above
hypotheses, the period N is defined by the following conditions
N = C0 + C 1
C0 + `B C1 = 2m 1
where C0 and C1 depend on f1 (). For instance, in the case B = A2 and f1 (u(n)) =
ui (n), the period is obtained as follows: the computation of C1 is based on the
distribution of runs of 1s (recall that a proper run of 1s having length k consists in
k consecutive 1s between two 0s) in a sequence generated by a primitive LFSR of
length m [34]
- 1 run of length m
- 0 runs of length m 1
- 1 run of length m 2
- 5.14 -
2m 1 + 2
3
where is the remainder of 2m 1 divided by 3, namely = 0 if m is even, and
= 1 if m is odd,
m
and finally
C0 = 2m 1 2C1 = 2 14
3
C1 =
2
2
N = (2m 1)
3
3
It is interesting to observe that, when m is even, the control sequence c is perfectly
balanced.
5.5.3
In the block [x(0), x(1), . . . , x(T 1)] of one period length, any symbol can be
identified by the corresponding state u(n) = `(n) of the shift register. Therefore
we write
[x(0), x(1), . . . , x(T 1)] = [x(`(0) ), x(`(1) ), . . . , x(`(T 1) )] ,
with the convention that if u(n) = 0 then `(n) = .
The effect of the clock-control is to exclude some states from the sequence S,
hence to exclude some x(i ) from the generated sequence, an operation that corresponds to puncturing the symbols within a period of the sequence, which is
equivalent to puncturing the underlying linear code.
5.5.4
m1
X
i=0
where
- 5.15 -
bT
1 (bT )
V = U 1 = ..
T
m1 (b )
be the inverse matrix of U, where the conjugate rows have been reported. It
follows that bT a = 1 and i (bT )a = 0 for any i 6= 0, hence the state of the
shift register at step n can be written in the form
u(n) =
m1
X
i=0
Observing that
h
A =
m1
X
i (h )i (a)i (bT ) ,
i=0
m1
X
i (n+`B C1 (n) )[rT i (a)][i (aTR )u(0)] = Tr([rT a][bT u(0)]n+`B C1 (n) ) .
i=0
(5.6)
Assuming n > m, these equations show that the linear complexity profile L(n)
of the sequence x(n) is the same as that of the sequence n+`B C1 (n) aTR u(0), which
is given by the smallest integer L(n) for which we can find a set of exponents kj
such that the system
`B C1 (n)
t
X
(5.7)
j=1
has a solution with cj = 0 for any j > L(n). However, given that the coefficient matrix D = (t(ki 1) ) is a Cauchy-Vandermonde matrix, it follows that
[bT u(0)]1 cj is zero if and only if its corresponding column in D is equal to the
column of known terms. This is possible only for one column. Therefore, except
when C1 (n) = kn so that we can take kj = k, the number of non-zero coefficients
cj is n. In conclusion, the linear complexity is L(n) = n.
- 5.16 -
Stream generator
Source
Encoder
Public channel
Random bit
generator
5.6
A stream enciphering scheme with rate less than 1 leaves at our disposal some
redundancy, which permits the introduction of a genuine randomicity in the enciphering stream. The basic idea is to encode the information using error correcting
codes, and to add a number of errors within the code error correcting capability.
Consequently, the information is masked with a random key whose length grows
with the message length. This situation is general for any alphabet; however, it
will be described in detail only for binary stream generators.
A sequence of binary message symbols
x1 , x2 , . . . , xj , . . .
is partitioned into blocks xi of length ki () that depends on a random variable .
Thus, the plain sequence is composed of binary vectors of variable length
x1 , x 2 , . . . , xi , . . .
each of which will be encoded by means of a linear code (ni , ki , di ) of length depending on the index i; a random pattern of ni bits and weight not greater than
b di21 c is then added, following the scheme in Figure 5.2, to mask the information
(i.e. to perform the encryption).
Let C be a class of codes C of parameters [ni (), ki (), di ()], where every code,
together with its parameters, is considered depending on a random variable . In
principle, the class C may contain an infinite number of codes.
Each block of ki () bits is encrypted into a block of ni () bits, thus the encryption
rate could also be a random variable:
i () =
ki ()
.
ni ()
However, it will be assumed that the sub-class of codes used has constant rate
o independent of and i. Assume that the codes [n(), k(), d()] satisfy the
- 5.17 -
(5.8)
and assume that, asymptotically, equality is achieved. Using the asymptotic form
of the bound, we have
1 H2 () = o
(5.9)
where = nd , and H2 () is the binary entropy defined as
H2 () = lg2 (1 ) lg2 (1 ) .
Let Gi () be the generator matrix of the code used to encode xi at step i, and
ei () be a vector of errors of length ni () and Hamming weight not greater than
c, which is the number of errors that a code with minimum distance
t = b di ()1
2
di () can correct.
Let C=s0 be the initial code defined by means of the secret key K0 , which is also
used to define both the initial state of the stream generator, and the value s0 of
the random variable .
The encryption is performed as follows:
ri = Gi ()xi () + ei () + vi ,
where vi is a block of ni () bits that belongs to a secret stream produced by some
stream generator. Its presence is essential for the protocol, because it masks the
sequence of code words, which belong to codes that anyway are unknown to the
attacker, but due to their redundancy may offer some weak point of attack. The
scope of the sequence vi is to make the theoretical performances demonstrated
below effective and practically achievable.
In the following analysis, the sequence vi is not considered, because the aim is
to evaluate the strength introduced by the redundancy and the genuinely random errors. Therefore, in the given model, the (binary) mutual information that
a block r of received symbols exchanges with the block x of information transmitted symbols is
I(R, X ) = H(R) H(R|X ) ,
where R is the set of receivable blocks, and X is the set of block that are transmitted. Let E be the set of possible genuinely random block errors; we have
H(R) = max{H(X ), H(E)}
H(R|X ) = H(E) .
In these hypotheses, since Plotkins bound certainly holds true, i.e.
certainly H(X ) H(E), hence the mutual information is
I(R, X ) = H(X ) H(E) .
- 5.18 -
di
ni
< 21 , it is
H(E) = nH2 ( ) ,
2
d
where 2 2n is the rate of correctable errors.
Therefore, perfect encryption is achieved if the condition
= H2 ( )
2
is satisfied. Using Gilbert-Varshamovs inequality, we obtain an upper bound for
, and thus the maximum encryption rate that allows perfect encryption is
1 H2 () H2 ( ) .
2
Solving equation (5.10) taken with the equality sign, we obtain
(5.10)
= 0.15135 ,
thus the encryption rate is upper bounded as
0.3867 .
The meaning of this bound is that of making perfect encryption practically possible without exchanging secret keys of the same length as the message to be
protected, as in the case of the one-time pad. The price to pay is that of sending
useful information at a reduced rate, not greater than 0.3867.
Obviously, every exhaustive local attack, aimed at identifying the code used locally, has a complexity which is a function of the code length. Furthermore, in
the proposed scheme, this attack is made even more difficult by the presence of
the pseudo-random stream that masks the genuine random sequence of errors.
Further, any attack on the code must resolve two issues:
- 5.19 -
1. The code length is not known, thus the attacker must analyze all codes of
all possible (reasonable) lengths.
2. If the codes are sufficiently long, i.e. of length greater than 500, and are
codes of the Goppa class, with generator matrix transformed with a random secret matrix which depends on the secret key K0 , the complexity of
any known attack is at least equal to the complexity of an attack on the
McEliece public-key scheme, with the further issue that the generator matrix is unknown.
As previously observed, the price to pay is a slower transmission rate which is
upper bounded by 0.3867. However, the rate loss is not very large, as could appear at first sight, because Shannon perfect enciphering has net transmission rate
0.5.
5.7
0
1
0 ... 0
0
0
1 ... 0
..
A=
0
0
0 ... 1
er er1 er2 . . . e1
Definition 5.5. Let M(m, Z2 ) be the ring of the m m matrices with binary entries.
The natural representation of F2m is an homomorphism : F2m M(m, Z2 ) with
kernel the zero element 0 F2m (). Therefore is a linear one-to-one mapping, and the
image of F2m () is precisely a subfield of M(m, Z2 ) consisting of matrices obtained as
follows:
(a) () = A.
(b) (1) = Ir and (0) = Or , where Ir and Or denote the identity and the all-zero
matrix in M(m, Z2 ) respectively.
(c) If x = a0 +a1 +. . .+am1 m1 , ai Z2 , then (x) = a0 It +a1 A+. . .+am1 Am1
Definition 5.6. Two binary representations and of the same field F2m are called
equivalent if there is a matrix U with entries in F2m , such that (x) = U (x)U 1 for
every x F2m .
Definition 5.7. A representation of F2m is said to be reducible over some extension
field F if there is a matrix Q with entries in F such that Q(x)Q1 x F2m is a block
diagonal matrix; otherwise it is said to be absolutely irreducible.
- 5.20 -
Proper subfields are represented by matrices having peculiar properties, as specified in the following theorem.
Theorem 5.6. If F2m has a proper subfield F2n , then
i) n|m;
ii) Any natural m-dimensional representation of F2m is equivalent to a representation in which the matrices associated to elements of F2n are the outer tensor
product of the n n matrices of a natural integral representation of F2n by the
m/n m/n identity matrix. The characteristic polynomial of (x), x F2m , is a
power of exponent m/n of the characteristic polynomial of (x).
Proof. Since F2m is a normal extension of degree m/n of F2n , then F2m admits a
natural (m/n)-dimensional representation over F2n . A representation of F2m
over Z2 is obtained by the compound with , namely, the entries of the matrices
of are substituted with the matrices of whose entries are in Z2 . Clearly in
the elements of F2m consist of block diagonal matrices where the same block
occurs m/n times. The two representations and are isomorphic.
5.8
This appendix is a short review of the most common methods for solving linear
recurrent equations.
- 5.21 -
Let f (n) be a function defined over Z, and assuming its values in Fq , that is f is a
mapping from Z into Fq . Usually, the values of f form a sequence extending from
to +
, f (m), f (m + 1), , f (1), f (0), f (1), , f (n), ,
however, in many interesting cases, the sequence is bounded on the negative side
f (m), f (m + 1), , f (1), f (0), f (1), , f (n), .
Whatever the definition domain may be, a function f (n) is said to be periodic of
period N , if N is the smallest integer such that f (n) = f (n + N ) n m, where
m is possibly .
Definition 5.8. Given
- a function u(n) defined for every n Z, and assuming values in Fq ,
- a fixed set of k numbers h1 , h2 , h3 , . . . , hk which belong to Fq , with hk 6= 0,
the recurrent relation
x(n) =
k
X
hi x(n i) + u(n)
n Z .
(5.11)
i=1
specifies a function x(n), and is called a linear recurrent equation of order k over Fq .
If the relation (5.11) is considered only for every n k, it is necessary to assign a set of
k initial values for the function x(n)
x(1) = a1 , x(2) = a2 , ..., x(k) = ak ,
called initial conditions, to start the recurrence. The k values ai may be chosen independently.
Further, the following notions and definitions are useful:
1. A linear recurrent equation is said to be homogeneous if the function u(n)
in (5.11) is identically zero
y(n) =
k
X
hi y(n i)
n k ,
(5.12)
i=1
4. A solution that satisfies a given set of initial conditions is said to be a particular solution.
It is straightforward, using properties of linearity, to check that the general solution of (5.11), for every n k, can be written as a sum of two functions:
x(n) = y(n) + w(n)
where:
5.8.1
Generating functions
a(n)z n .
n=0
y(n)z
X
k
k
X
X
X
n
=
{
hi y(n i)}z =
hi {
y(n i)z n } =
n=0
n=0
k
X
i=1
k
X
i=1
n=0
k
i1
X
X
X
n
hi {
y(n i)z } +
hi
y(n i)z n =
i=1
i=1
n=i
hi {
i=1
y(m)z m+i } +
m=0
n=0
k1 X
k
X
{
hi y(n i)}z n .
n=0 i=n+1
- 5.23 -
Y(z) =
y(n)z n
n=0
X
H(z) = 1 +
hi z i ,
,
i=1
k1 X
k
{
hi y(n i)}z n
I(z) =
n=0 i=n+1
k
X
hi z i + I(z) ,
i=1
I(z)
.
H(z)
(5.13)
Let d(z) = gcd{I(z), H(z)} be the greatest common divisor between I(z) and H(z),
then
I(z) = d(z)I1 (z) and H(z) = d(z)H1 (z) ,
thus, the equation (5.13) simplifies to a ratio of polynomials which are relatively
prime
I1 (z)
.
(5.14)
Y(z) =
H1 (z)
Since H1 (z) is a polynomial over F2 , a minimum integer N exists such that H1 (z)
divides z N 1. We can thus write
H1 (z)G1 (z) = z N 1 ,
and, multiplying numerator and denominator of equation (5.14) by G1 (z), we
have
I1 (z)G1 (z)
I1 (z)G1 (z)
Y(z) =
=
.
(5.15)
H1 (z)G1 (z)
zN 1
Note that the polynomial I1 (z)G1 (z) represents the period of the generated sequence. Further, equation (5.15) used jointly with equation (5.13) yields the following theorem.
Theorem 5.7. Every solution of a homogeneous recurrent equation (5.12) of order k
over F2 is periodic of period N 1. The period depends on the polynomial H(z), and
suborderly on the initial conditions:
1) If the initial conditions are the all zero vector X(0) = 0, the period is N = 1 and is
independent of the polynomial H(z).
- 5.24 -
U(z)
I1 (z)G1 (z) U(z)
I(z)
+
=
+
,
H(z) H(z)
zN 1
H(z)
(5.16)
x(n)z n
X(z) =
n=0
u(n)z n .
U(z)
=
n=0
Equation (5.16) confirms a previous statement that the general solution of (5.11)
can be written as a sum of two functions. More precisely, we have
1) the general solution of the homogeneous equation is given by the generating function
I(z)
Y(z) =
,
(5.17)
H(z)
where I(z) is a polynomial of degree k 1 whose coefficients are undetermined constants;
2) a particular solution of (5.11), that corresponds to the all-zero initial conditions, is given by the generating function
W(z) =
U(z)
.
H(z)
(5.18)
The method of the generating function may also be directly applied to solve a
system of equations.
Multiply the equation (5.2) by z n , and sum for n from 0 to infinity
x(n)z n = A
x(n 1)z n +
n=0
- 5.25 -
u(n)z n .
Then setting
X(z) =
x(n)z n
and
U(z) =
u(n)z n
where X(z) and U(z) are vectors of k entries which are function of z, we may
write the equation
X(z) = Ax(1) + zAX(z) + U(z) ,
from which we obtain
(I + zA)X(z) = Ax(1) + U(z) .
Finally, taking the inverse of the matrix I + zA we obtain the solution of (5.2) as
X(z) = (I + zA)1 Ax(1) + (I + zA)1 U(z)
(5.19)
In case of the homogeneous equation (5.3), the solution has the same form, with
the term containing the vector U(z) missing.
The generating function technique for solving recurrent linear systems of equations is illustrated by the following examples.
Example - 1.
a formal solution is
1
1
1
z
0 1
1
z
Y(z) =
y(1) +
z 1+z
1 0
z 1+z
Taking y(1) =
a
b
.
1
Y(z) =
1 + z + z2
Example - 3.
1
1+z
z
1+z
b + az
a + b + bz
1
+
(1 + z + z 2 )(1 + z)2
1 + z + z2
z + z2
.
0 1 0
y(n) = 0 0 1 y(n 1) .
1 1 0
1 + z2
z
z2
c1
1
z2
1
z c2 .
Y(z) =
1 + z2 + z3
2
c2
z
z+z 1
Assuming yT (1) = (1, 0, 0), the arbitrary constants are easily found: (c1 , c2 , c3 ) =
(0, 0, 1), thus the corresponding particular solution is
2
z
1
z ,
Y(z) =
1 + z2 + z3
1
whose components are explicitly
y1 (z) =
z2
= z2 + z4 + z5 + z6 + . . .
1 + z2 + z3
z
= z + z3 + z4 + z5 + . . .
1 + z2 + z3
1
y3 (z) =
= 1 + z2 + z3 + z4 + . . . .
2
3
1+z +z
y2 (z) =
5.8.2
6= 0 .
(5.20)
(5.21)
(5.22)
(5.24)
(5.25)
(5.26)
0 1
1 1
y(n 1) .
The eigenvalues of the coefficient matrix are roots of the characteristic polynomial
ch() = det(I + A) = det
1
1 1+
= 2 + + 1 .
v1 + v2
= 0
v1 + (1 + )v2 = 0
and
(1 + )v1 + v2 = 0
v1 + v2
= 0
respectively. Finally, the general solution of the given system of recurrent equations is
1+
n
n
y(n) = c1
+ c2 (1 + )
.
1
1
where, in order for the solution to belong to F2 if c1 = a + b, then c2 = a + b + b.
Example - 5. Consider the recurrent equation of second order over F2 :
y(n) = y(n 2) ,
which has characteristic equation
2 + 1 = 0
with double root = 1 in F2 : a solution of (5.27) is y1 (n) = 1n = 1.
A second linearly independent solution is of the form
y(n) = nn .
Substituting into (5.27) we have
nn = (n 2)n2 ,
which implies
n(2 + 1) = 0 = = 1 .
In conclusion, a general solution of (5.2) is
y(n) = c1 + nc2 .
- 5.30 -
(5.27)
5.9
hk 6= 0 .
(5.28)
Setting
y(n)
= yk (n)
y(n 1)
= yk1 (n)
..
y(n k + 1) = y (n)
1
y1 (n)
= y2 (n 1)
= y3 (n 1)
y2 (n)
..
.
yk1 (n) = yk (n 1)
y (n)
= h1 yk (n 1) + h2 yk1 (n 1) + . . . + hk y1 (n 1)
k
where the last equation has been written making the proper substitutions in
(5.28). Setting yT (n) = (y1 (n), y2 (n), . . . , yk (n)), we may write the system of equations in matrix form
y(n) = Ay(n 1) ,
(5.29)
where
0
0
..
.
1
0
A=
0
0
hk hk1
0
0
0 ... 1
. . . h2 h1
0
1
...
...
..
.
is the coefficient matrix. Clearly, the system (5.29) is equivalent to the equation
(5.28), and vice versa. The k k matrix is known as the companion matrix of the
polynomial ch(z), the characteristic polynomial of the linear recurrent equation
(5.28). Moreover, det(zI+A) = ch(z), that is ch(z) is the characteristic polynomial
of A.
Definition 5.10. The linear recurring homogeneous system defined as
w(n) = AT w(n 1) ,
(5.30)
where w(n) is a column vector of k unknown functions, is called the conjugate system of
the linear recurring system defined by equation (5.29).
- 5.31 -
Since A and AT have the same characteristic polynomial, both conjugate systems
are equivalent to the equation (5.28). This equivalence is evident if we write the
solutions of the systems in matrix form, using the generating functions and considering the transpose of the equation (5.30)
Y(z) = (I + zA)1 Ay(1) and WT (z) = wT (1)A(I + zA)1 .
It is useful, for later use, to write the conjugate system (5.29) explicitly
0 0 0 . . . 0 hk
1 0 0 . . . 0 hk1
.. .. ..
w(n) = ...
w(n 1) ;
.
.
.
0 0 0 . . . 0 h2
0 0 . . . 0 1 h1
this corresponds to k linear equations of the form
w1 (n)
= hk wk (n 1)
= w1 (n 1) + hk1 wk (n 1)
w2 (n)
..
.
w
(n)
=
wk2 (n 1) + h2 wk (n 1)
k1
w (n)
= wk1 (n 1) + h1 wk (n 1)
k
(5.31)
0 1 0 ... 0
0 0 1 0 ...
..
.
.
.. .. 0
C= .
,
0 0 ... 0 1
1 0 ... 0 0
which satisfies the property Cn = I.
Let U be a binary nonsingular matrix, then the matrix B = UAU1 has the same
characteristic polynomial as A. Therefore the linear recurrent system
w(n) = UAU1 w(n 1) ,
(5.32)
is equivalent to the linear recurrent equation (5.28). The interesting point is that
the matrix B can have special structures which are useful in many applications.
A particularly interesting structure is the tridiagonal form
d0 1 0
...
0
1 d1 1
...
0
..
..
B = ...
.
.
.
0 . . . 1 dk2
1
0 0 ...
1
dk1
- 5.32 -
We will see that not every polynomial ch(z) of degree k can be the characteristic polynomial of a tridiagonal matrix (i.e. not every linear recurrence (5.28) is
equivalent to a tridiagonal system), although an important class of k-degree polynomials have tridiagonal matrices. In this context, the following matrices will be
useful
d0 1 0
...
1
1 d1 1
...
0
..
..
F = ...
(5.33)
,
.
.
0 . . . 1 dk2
1
1 0 ...
1
dk1
and the diagonal matrix D = diag(d0 , d1 , . . . , dk1 ), which allows us to write
F = D + C + CT .
Moreover, we have Ch DCh = diag(dh , d1+h , . . . , dh+k ), where the subscripts are
evaluated modk.
Many important properties of tridiagonal matrices are deduced from their strict
relation with the continued fractions. Consider the matrix zI + B, and define
D0,k1 (z) = det(zI + B) as its determinant, in other words D0,k1 (z) is the characteristic polynomial of B. Furthermore, define Dh,k1 (z) as the determinant of the
matrix obtained by suppressing the first h rows and columns from zI + B. Developing the determinant D0,k1 (z) along the last column, we obtain the second
order linear recurrence
D0,k1 (z) = (z + dk )D0,k2 (z) + D0,k3 (z) ;
(5.34)
conversely, developing the determinant D0,k1 (z) along the first column, we obtain the linear recurrence
D0,k1 (z) = (z + d0 )D1,k1 (z) + D2,k1 (z) .
(5.35)
This equation, upon division by D1,k1 (z), can be written in the form
1
D0,k1 (z)
= (z + d0 ) +
,
D1,k1 (z)
D1,k1 (z)
D2,k1 (z)
which shows that D0,k1 (z) is the numerator of the k-th convergent of a regular
finite continued fraction
1
z + d0 +
.
1
z + d1 +
1
z + d2 +
..
1
.+
z + dk1
The denominator D1,k1 (z) of the k-th convergent satisfies a second order linear
recurrence
D1,k1 (z) = (z + d1 )D2,k1 (z) + D3,k1 (z) .
(5.36)
- 5.33 -
(5.37)
0 0 0 ... 1
0 0 ... 1 0
..
.
.
.
.
K= .
,
.
.
0 1 ... 0 0
1 0 ... 0 0
and satisfying the equation K2 = I, i.e. K is involutory. The number of k-degree
polynomials is 2k , but the symmetry shown by (5.37) implies that the number of
tridiagonal matrices with distinct characteristic polynomials is less than 2k . It follows that not every k-degree polynomial can correspond to a tridiagonal matrix.
This fact poses the interesting problem of characterizing the k-degree polynomials that have tridiagonal matrices. A first answer is given by the following theorem, shown in [11], which uses the formal derivative ch(z)0 of the characteristic
polynomial. The proof given here is a shortened version of that given in [11], and
needs a lemma that will be proved as a preliminary.
Pk1
Lemma 5.4 ([11]). Let w1 =
i=0 di mod 2 be the trace modulo 2 of F, and w0 =
k mod 2 be the remainder of the dimension k modulo 2. Let 0,k1 (z) = det(zI + F) be
the characteristic polynomial of the matrix F, then
q(z)2
if w0 = 0 w1 = 0
2
(z + 1)q(z)
if w0 = 1 w1 = 1
0,k1 (z) =
.
(5.38)
2
zq(z)
if w0 = 1 w1 = 0
z(z + 1)q(z)2
if w0 = 0 w1 = 1
Proof. The proof is by induction on k, and consists of 10 cases. The base cases can
be checked by calculating the characteristic polynomials of all cyclic matrices for
- 5.34 -
(z + 1)2 or z 2
z2 + z + 1
3
z(z + 1)2 or z z 2
(z + 1)z 2 or (z + 1)(z + 1)2
4 z 4 or (z 2 + z + 1)2 or (z + 1)4 or z 2 (z + 1)2
z(z + 1)(z + 1)2 or z(z + 1)z 2
w1
0
1
0
1
0
1
The exception of 2 2 matrices with trace 1 is not used in the recursion for k 5.
The first eight cases apply when D has at least two equal adjacent elements dj =
dj+1 that, in view of the property
Ch FCh = Ch DCh + C + CT = diag(dh , d1+h , . . . , dh+k ) + C + CT ,
can be assumed to be d0 = dk1 by choosing a convenient h. Developing the
determinant det(zI + F) along the first row, we get the relation
0,k1 (z) = (z + d0 )D1,k1 + D2,k1 + D1,k2 = D0,k1 + D1,k2
(5.39)
which implies 1,k1 (z) = D1,k1 + D2,k2 and 1,k2 (z) = D1,k2 + D2,k3 , thus,
we have
0,k1 (z) = (z + d0 )1,k1 (z) + 1,k2 (z) .
Consider the case that k is odd, w = 0 = 1, and supposing d0 = dk1 = 1, we have
0,k1 (z) = (z + 1)1,k1 (z) + 1,k2 (z) = (z + 1)q1 (z)2 + (z + 1)q2 (z)2
.
= (z + 1)(q1 (z) + q2 (z))2
Assuming d0 = dk1 = 0, we have
0,k1 (z) = z1,k1 (z) + 1,k2 (z) = z(z + 1)zq1 (z)2 + (z + 1)q2 (z)2
.
= (z + 1)(zq1 (z) + q2 (z))2
With similar arguments, we may dispose of the eight cases. The last two cases
appear when it is not possible to have d0 = dk1 , an event that may occur only
if k is even. In this case we may have D = diag(0, 1, 0, 1, . . . , 0, 1). This special
structure gives the equation
0,k1 (z) = z(z + 1)2,k1 (z) + 2,k1 (z) .
It is straightforward to check that the inductive hypothesis holds. This completes
the proof.
The identity (5.37) implies that two continued fractions of the same length k exist
that lead to the same numerator polynomial D0,k1 (z) of degree k, and to two
- 5.35 -
denominator polynomials D0,k2 (z) and D1,k1 (z). We may therefore consider
these latter two polynomials as roots of a second degree equation
y 2 + 1 y + 2 mod ch(z) ,
where 1 = D0,k2 (z) + D1,k1 (z), and 2 = D0,k2 (z)D1,k1 (z). The values of
1 mod ch(z) and 2 mod ch(z) will be specified in next the theorem.
Theorem 5.8 ([11]). A polynomial ch(z) is a characteristic polynomial of a tridiagonal
matrix, only if the quadratic equation
y 2 + z(z + 1)ch(z)0 y + 1 = 0 mod ch(z) ,
(5.40)
Pk1
Qk1
- 5.36 -
(5.41)
0,k1 (z) = [D0,k1 (z) + D0,k2 (z)] + D1,k2 (z) = (z)q3 (z)2
(5.42)
0,k1 (z) = [D0,k1 (z)+D0,k2 (z)+D1,k1 (z)+D1,k2 (z)]+D1,k2 (z) = D0,k1 (z)+1
(5.43)
since D1,k2 (z) remains unchanged. Combining (5.41) and (5.42) we have
1 = (z)[q2 (z) + q3 (z)]2 = (z)q1 (z)2 ,
(5.44)
(5.45)
(z)
D0,k1 (z) = q1 (z)2 + z(z + 1)q4 (z)2 = [q1 (z)2 + z 2 q4 (z)2 ] + zq4 (z)2 ,
which implies that
dD0,k1 (z)
dz
1 = q1 (z)2 = z(z + 1)
2) (z) = z(z + 1), 1 (z) = 1 :
1 = z(z + 1)q1 (z)2 , and
dD0,k1 (z)
.
dz
D0,k1 (z) = z(z + 1)q1 (z)2 + q4 (z)2 = [z 2 q1 (z)2 + q4 (z)2 ] + zq1 (z)2 ,
which implies that
dD0,k1 (z)
dz
dD0,k1 (z)
.
dz
D0,k1 (z) = zq1 (z)2 + (z + 1)q4 (z)2 = z[q1 (z)2 + q4 (z)2 ] + q4 (z)2 ,
which implies that
dD0,k1 (z)
dz
dD0,k1 (z)
dz
dD0,k1 (z)
mod D0,k1 (z)
dz
dD0,k1 (z)
.
dz
dD0,k1 (z)
dz
dD0,k1 (z)
mod D0,k1 (z)
dz
which implies
1 = (z + 1)q1 (z)2 = z(z + 1)
dD0,k1 (z)
.
dz
Corollary 5.1 ([11]). For a polynomial ch(z) of degree n to be the characteristic polynomial of a tridiagonal matrix it is sufficient that (5.8) has at least two solutions q1 (z)
and q2 (z) in Z2 [z] which are polynomials of degree n 1, and that the continued fraction
has length n.
expansion of ch(z)
q1 (z)
The following theorems yield necessary or sufficient conditions for a polynomial
p(z) to be the characteristic polynomial of some tridiagonal matrix. They also
offer an algorithm for constructing the tridiagonal matrix, and show that any
irreducible polynomial has a tridiagonal matrix. The main tool is the Lanczos
tridiagonalization algorithm, which is now briefly recalled from [40] adapted to
our case. Two square matrices A and B are similar if a nonsingular matrix S
exists such that B = SAS1 . For two given column vectors x and y, and a matrix
A, define two Krylov matrices as
K(A, x) = (x, Ax, A2 x, . . . , An1 x) , K(AT , y) = (y, (AT )y, . . . , (AT )n1 y) .
Assume that K(A, x) is nonsingular, then
CA = K(A, x)1 AK(A, x)
is a companion matrix for the characteristic polynomial of A. Moreover, if R is
any nonsingular upper triangular matrix and S = K(A, x)R then S1 aS is in
upper Hessemberg form. Suppose also that G = K(AT , y) is non singular and
that K(AT , y)T K(A, x) can be written as LDU, in which L is lower triangular, U
is upper triangular, and D is diagonal. Therefore, there exist nonsingular upper
triangular matrices R and Q such that
(K(A, x)R)1 = K(AT , y)T Q and the matrix T = QT K(AT , y)T AK(A, x)R
is tridiagonal. The matrix G of a companion matrix C is a Hankel matrix, that is
a matrix of the form
h1 h2 . . . hn1
hn
h2 h3 . . . hn
hn+1
H = h3 h4 . . . hn+1 hn+2 .
..
..
..
..
..
.
.
.
.
.
hn hn+1 . . . h2n2 h2n1
- 5.38 -
1
0 0 0
a21 1 0 0
a11
0 1 0
L10 = 0
.
.
0
. 0
0 0
0
0 0 1
21
a12 , and thus a022 is not zero if and only
which substitutes a22 with a022 = a22 aa11
if a22 a11 a21 a12 6= 0. In a similar way, all elements of the first column are made
zero, after which the elements of the second column are made zero. Therefore,
a033 is a multiple of the leading minor of order 3, and must not be zero in order
to zero the elements below it in the third column. The recursion is evident and
enables the proof to be completed.
1 h2
h3
hn
0 1 h4 + h3 h2 hn+1 + h2 hn
h3 h4
.
h
h
5
n+2
..
..
..
..
...
.
.
.
.
The third leading minor is h5 +h3 +h4 +h4 h3 h2 = 1, which implies h5 +h4 +h2 = 0
because h2 h3 = 0, and 1 + h3 = h2 . The fourth leading minor can be reduced to
the form
1 h2 h3
1 h2 h3
h
h
4
4
0 1 h4 h5 + h4 h2 0 1 h4 h5 + h4 h2
,
0 0 1 h6 + h2 h4 = 0 0 1
h6 + h2 h4
h4 h5 h6
0 0 0 h7 + h6 + h2
h7
which implies h7 + h6 + h3 = 0. The argument can be iterated to show the claimed
recursive conditions.
- 5.39 -
- 5.40 -
Chapter 6
Public-key Cryptography
Nothing doth more hurt in a state than
that cunning men pass for wise.
Francis Bacon
6.1
Introduction
In 1976 two researchers at Stanford University, Whitfield Diffie, and Martin Hellman, invented the Public Key cryptography, with the aim of providing a method
for exchanging secret information (at that time secret keys) on a public channel
without any private agreement. At the origin of Diffie and Hellmans studies was
the explosion, in the 1970s, of the electronic exchanges of sensitive documents;
complex banking systems were also calling for inexpensive systems of secret key
distribution that would avoid couriers or other costly movement of people. But
Diffie and Hellman did more: they introduced a new way of looking at cryptography. The consequences have been very far reaching, and the basic concept
(although not entirely understood) of one-way function lead to many information
protection formats that are indispensable in the modern globalised society. A society that is dominated by global communication systems, electronic signatures,
message authentication, remote control of access, and mobile or cellular phones.
All these remote and secure functions are indispensable in applications such as
e-commerce, home-banking, issue of personal documents, e-medicine, and electronic voting.
Diffie and Hellmans discovery triggered very intense activity in the fields of
proposing public key mechanisms and evaluating their security or strength. This
activity favored a kind of renaissance in subjects such as number theory and discrete mathematics, and great progress was made in domains that had been almost
entirely neglected (or at most dominion of a selected few) for centuries.
It is claimed that the notion of the public key was already known to the British
Intelligence Services at least 10 years earlier, but that it was kept secret. However,
the merit of having created a totally new discipline and opening a new approach
to cryptography undoubtedly belongs to Diffie and Hellman.
- 6.1 -
Diffie and Hellmans key exchange. In their seminal paper [20], Diffie and
Hellman proposed, by way of example, a mechanism for public key exchange
on a public channel based on the difficulty of computing the discrete logarithm
in a cyclic group.
The scheme that they invented has become a standard model for describing any
public key cryptographic scheme. The problem
BOB and ALICE want to establish private communication on a public channel (i.e. home phone) by means of a stream cipher, and need to share a common
secret key that must be communicated on the same unsafe channel without
any previous agreement.
The solution consists of three steps:
Step 1: BOB and ALICE agree to operate with a cyclic group CN of order N , and
to use a generator CN of the group.
Step 2: BOB and ALICE perform the following operations
1: ALICE chooses a random number X < N , computes Xp = X , and sends
this number to BOB;
2: BOB chooses a random number Y < N , computes Yp = Y , and sends
this number to ALICE;
3: ALICE, using Yp , computes the common key as
K = YpX = (Y )X = XY
4: BOB, using Xp , computes the common key as
K = XpY = (X )Y = XY
Step 3: BOB and ALICE start their private conversation encrypted with a stream
cipher (a secret-key encryption system) using the common secret key K.
An adversary (typically the bad EVE) who wants to know K observes the public conversation, and thus knows the algebraic domain where the operations are
performed, and three elements of CN
, Xp , Yp ,
from which she would like to recover the common key K = XY .
If it is easy to compute the discrete logarithm L (.) in CN , then from Xp , the
adversary finds X, and then knowing Yp she computes K = XY .
It is believed that computation of the discrete logarithm is generally a difficult
problem, that is the exponentiation y = x and the inverse operation x = L (y)
are easy and hard respectively: in other words, they define what is known as a
one-way function. However, concerning the Diffie and Hellmans scheme, the
question of whether its strength is equivalent to that of the discrete logarithm has
not yet been settled. It has been conjectured that to compute XY from , X , and
Y is equivalent to the discrete logarithm problem, but the equivalence question
is still open.
- 6.2 -
6.1.1
One-way functions
The public key concept, just described, is based on the introduction of a novel
notion of function: the one-way function. Intuitively this concept may be accepted
but, for a formal definition of one-way function, we must have at our disposal a
measure of computational complexity that is axiomatically defined, and that can
be used to evaluate the complexity of computing a given function in any point of
its definition domain. A detailed discussion about complexity measures is given
in Chapter 8; for the moment we will assume that such a measure relative to a
function f exists, and the complexity of f is denoted C(f ).
To clarify what is easy and what is hard to compute, we will informally introduce some notions that restrict the vagueness of the concept of one-way function.
Assume that f is defined over a domain D of cardinality N :
We say that f can be easily computed if C(f ) = O((log N )s ) for some finite
(possibly small) rational s.
We say that f is hard to compute if C(f ) = O(exp(r log N )) for some finite
small rational number 0 r < 1.
Definition 6.1. Let D and F be, possibly different, finite domains. A one-way function
F (.) from D into F is an invertible function whose images can be easily computed, and
whose counter-images are hard to compute. That is
y = F (x) F,
x D
y =(D) F
M = g(X, D) ,
where the pair [f, E] constitutes the public key used by the sender to encrypt, and
the pair [g, D] constitutes the private key used by the addressee to decrypt.
Both functions f and g should be chosen such that, from a knowledge of the
public key, it is not possible to recover either the private key [g, D] or in particular
the message M .
Since the decryption key is different from the encryption key, these algorithms
are said to be asymmetric. The example given by Diffie and Hellman was based
on the function f (x) = x which is an instance of one-way function, because it is
easy to compute ax , as we will see below, but it is difficult to recover x from f (x).
In many typical realizations, G is the cyclic group of the multiplicative structure
of a finite field, for example Zp , the field of remainders modulo a prime p, and R
is the additive structure of the ring of remainders modulo p 1.
A numerical idea, that holds in general, of the contrast between easy and hard
to compute is offered by the complexity of exponentiation and the complexity of
discrete logarithm computation.
Evaluation of powers. Let x be an element of a multiplicative cyclic group C of
order N , and n be a positive integer less than N . The power xn (function exp) can
be evaluated with no more than 2 log2 N multiplications in C [47]. The algorithm
can be easily described:
Setting s = blog2 N c, and writing n < N in base 2
n = b0 + b1 2 + b2 22 + + bs 2s
we have
2 ++b
s
s2
bi = 0, 1 ,
2
= xb 0 xb 1 2 xb 2 2 xb s 2 .
i
x, x2 , x2 , . . . x2 , . . . , x2
i+1
which can be recursively done with s squares, since x2 = (x2 )2 , and performing s products at most, note that the powers y bi cost nothing, because bi = 0, 1.
Assuming that the complexity of squaring is equivalent to the complexity of multiplying, the number of products sufficient to compute xn is at most
s + s = 2blog2 N c = O(log2 N ) ,
and in conclusion C(exp) = O(log2 N ).
- 6.4 -
Shanks algorithm for computing the discrete logarithm. The idea for computing the discrete logarithm with a complexity smaller than that of an exhaustive
search is fairly simple, and was proposed by Daniel Shanks in the 1970s. No better algorithm that works in general cases has yet been found.
Let g be a generator of a cyclic group C of order N . Given x C, the integer n
such that x = g n is called the discrete logarithm of x to base g, and is written
n = Lg (x). The discrete logarithm satisfies all the properties of usual logarithms,
although the most typical property, that is the logarithm of the product
Lg (xy) = Lg (x) + Lg (y) mod N
x = g n0 +n1 b N c = g n0 g n1 b N c = g n0 g b N c
,
we may derive the key equality
n1
xg n0 = g b N c
,
which suggests the following procedure for finding n0 and n1 :
1. Construct a table with two columns; in the first column
write
n1the variable
b Nc
0 n1 < d N e, and in the second column the power g
:
2. Sort the table rows according to the second column.
These operations can
be done off line and require a memory of size O( N ).
Note that search in sorted sequences of length m has complexity O(log m).
O( N ), respectively. These rather naive complexity bounds have not been improved after forty years of research, and seem to be the contrasted bounds for
defining one-way functions. Theoretically they cannot be improved, in the sense
that the lower bound order cannot be reduced and the upper bound order cannot
be increased.
- 6.5 -
Definition 6.2. The elements of a sub-class of one-way functions satisfying the further
condition of being hard to invert only in the absence of suitable private information, are
called trapdoor one-way functions, or trapdoor functions for short.
Trapdoor functions may be weaker than one-way functions without trapdoor.
The question is open, but what is remarkable is that many applications of the
public key concept necessarily require trapdoor functions.
Since the concept of public key was introduced along with the notion of one-way
function, myriads of algorithms, taken from many different areas of mathematics, has been proposed claiming that they comprise one-way functions. Most of
these algorithms are based on elementary properties of the integer numbers. The
hypothesis that one-way functions exist is based on the matter of fact that certain functions have never been easy to invert, or that an algorithm that efficiently
computes their inverse has not yet been found. What emerged, and defines the
present status, is that three kinds of problems (all taken from number theory)
provide the following inverse functions, which are difficult to compute:
1. discrete logarithms in cyclic groups
2. factoring of integers
3. searching unsorted data.
At present, no better way to compute these inverse functions, defined on sets of
size N
, has been found than to perform skilled exhaustive searches of complexity O( N ). The corresponding direct functions have small complexity, that is
they may be computed with complexity O(log N ). This state of affairs is better
explained in Chapter 8, which is devoted to computational complexity.
Although the existence of one-way functions has never been proved, public key
cryptography is pervading civil life, the economy, and banking systems. Ironically, the security, indeed the very existence, of these system is based on unproved
assumptions.
In the following sections, the most significant instances of one-way functions
used today will be described.
6.2
In 1977, Rivest, Shamir, and Adleman proposed a one-way function based on the
difficulty of factoring a number product of two primes, which of course should
be large, i.e. each factor should have at least 100 digits. The factorization problem
was considered difficult from Gausss time, and much efforts has unsuccessfully
been devoted to finding an efficient factoring method. The genial idea of Rivest,
Shamir, and Adleman was to turn this difficulty into an advantage for cryptography.
Given two distinct prime numbers p and q, let N = pq be their product, and E be
and integer less than and relatively prime with the value (N ) = (p 1)(q 1) of
- 6.6 -
(N +1b) (N +1b)2 4N
using the well known formula
and compute its roots p, q =
2
for its roots.
The outlined public key mechanism is known as the RSA algorithm (from the
initials of the inventors names, Rivest, Shamir, and Adleman). It is based on the
difficulty of factoring; however, it is not known whether it can be broken without
factoring. In other words, given E and N , it has not been proved that factoring
N and inverting the function f (x) = M E mod N are polynomially equivalent
problems.
From the practical standpoint, the prime numbers used should have more than
150 digits.
Historical curiosity. In August 1977, in Martin Gardners Mathematical Games
column in Scientific American, a problem posed by Rivest, Shamir, and Adleman
appeared. It consisted of the following challenge:
- 6.7 -
Alice broadcasts her public exponent E and her modulus N , where E = 9007 and
N = 1143816257578888676692357799761466120102182967212423625625618429
35706935245733897830597123563958705058989075147599290026879543541
Eve has intercepted the cipher text
C = 9686961375462206147714092225435588290575999112457431987469512093
0816298225145708356931476622883989628013391990551829945157815154
What is the message exciting Eves curiosity?
This problem become known as the RSA-129 problem, because N has 129 digits.
In order to decipher the message, the only known way to proceed is to factor the
129-digit N into the product of primes.
In April 1994, a team consisting of Derek Atkins, Michael Graff, Arjen Lenstra,
and Paul Leyland succeeded in factoring RSA-129. They used the double large
prime variation of the multiple polynomial quadratic sieve factoring method. The
sieving step was carried out in 8 months by about 600 volunteers from more than
20 countries. The end result was
N = 3490529510847650949147849619903898133417764638493387843990820577
32769132993266709549961988190834461413177642967992942539798288533
When decrypted with the secret exponent the message was
The magic words are squeamish ossifrage.
The use of the RSA needs a procedure for key generation, which requires the
generation of random large prime numbers (over 100 digits) and a random number E relatively prime with the Euler totient function of the modulus. Thus, it is
necessary to specify the encryption procedure which must include an encoding
rule for the alpha-numeric texts. Lastly, a decryption algorithm must be specified
which utilizes the computational resources in some efficient (if possible optimal)
way. The most expensive computations are the modular multiplications of numbers with more than 200 digits.
The procedure is described with the hypothesis that Alice creates the public key,
and Bob wants to send a message encrypted using Alices public key.
Alices Key generation operations .
1. Randomly choose two prime numbers p and q, with more than 100
digits each
2. Compute the modulus N = pq
3. Compute the Euler totient function (N ) = (p 1)(q 1)
4. Choose E randomly (possibly in the interval [max{p, q}+1, N 2]) such
that the greatest common divisor gcd{E, (N )} is 1.
- 6.8 -
1
1
2
2
=
=
=
=
1 mod p
0 mod q
0 mod p
1 mod q
(6.1)
Thus, the decryption operations may be done working with the moduli p
and q separately
M1 = C D mod p
M2 = C D mod q
Observation. In the proof that the RSA encryption and decryption algorithms
work, it is assumed that M is relatively prime with N = pq. Now, we prove
that this condition is not necessary, that is M may be either a multiple of p or a
multiple of q; it cannot be a multiple of both, because it is less than N anyway.
Nevertheless, it seems wise to check the relative primality between M and N ; if
by chance they are not relatively prime, the adversary factors N .
If gcd{M, N } = 1, the above proof that RSA decryption works correctly was based
on Euler-Fermats theorem.
The following proof, based on the CRT (Chinese Remainder Theorem), shows
that the inversion mechanism of the RSA works equally well for every M < N ,
whether or not they are relatively prime with N .
The encoded message M is decomposed according to the CRT as
M = M1 1 + M2 2
mod N ,
M1ED = M1
(p1)(q1)
= M1 M1
= M1 mod p
(p1)
because M1
= 1 mod p. While, if M1 is divisible by p, then M1 = 0 mod p and
obviously we have
M1ED = 0 = M1 mod p .
6.3
The Rabin scheme, like the RSA, consists of a one-way function based on the difficulty of factoring. Moreover, unlike the RSA, it will be seen that to break the
Rabin scheme is equivalent to factoring.
Let N = pq be a product of two primes p and q which are taken congruent 3 modulo 4 to avoid computation issues. The Rabin function is defined in the ring of
remainders modulo N as a simple square
C = M2
mod N .
(6.2)
Given C, if p and q are known, to compute M is easy using the CRT, because the
two equations
2
u = C
mod p
2
v = C
mod q .
can be solved easily by computing powers
(
p+1
u = C 4
q+1
v = C 4
mod p
mod q .
p1
2
=C
p+1
2
p1
2
= 1 mod p,
p+1
z1 = u1 + v2
z2 = u1 v2
,
(6.3)
z3 = u1 v2
z4 = u1 + v2
where 1 = qa and 2 = pb have been deduced from the relation qa + pb = 1
which was produced by the generalized Euclid algorithm.
For decryption, selecting the correct value for M requires the knowledge of two
bits (b0 , b1 ). These bits should be computed at the encryption side without knowing the factorization of N , and sent together with the encrypted message C. A
possible choice, first proposed by Williams [61] is the following
1. The first bit specifies the parity of M , that is
b0 = M mod 2
because if the four roots zi are considered modulo N in the interval from 1
to p 1, two roots are even and two roots are odd.
2. The second bit is obtained computing the Jacobi-Legendre symbol
1
M
1+
.
b1 =
N
2
Bob and Alice, who want to communicate privately with respect to Eve, know
that they have public keys NB and NA , respectively. To generate her key, Alice
choses two large random primes p and q congruent 3 modulo 4, and computes
NA = p q .
A communication protocol is the following:
- 6.11 -
mod N
i 6= j ,
mod N .
Necessarily each of the factors (zi + zj ) and (zi zj ) must be divisible by one
and only one of the prime factors N , then the greatest common divisor gcd{(zi +
zj ), N } is one of the factors of N .
An even faster way of looking at this property is to consider equation (6.3) and
add z1 and z2 getting z1 + z2 = 2u1 = 2uaq from which the factor q of N can be
recovered simply by computing the greatest common divisor gcd{z1 + z2 , N }.
It is remarked that the difficulty of solving the second degree equations in ZN is
strictly connected with the fact that the group ZN of invertible elements in ZN is
not cyclic.
6.4
The El Gamal public key encryption algorithm is based on the discrete-log problem, and is a revisitation of Diffie-Hellman public-key exchange, in order to avoid
- 6.12 -
key exchange in any conversation. The public key may be stored in a public directory, from which it is retrieved at each conversation, and does not require the
co-presence of both parties for its preparation.
The system is set up by some authority that manages a public directory, generates
a large prime p, choses a primitive element g of the multiplicative group Zp , and
makes public the pair [p, g].
Bob and Alice, who want to communicate privately with respect to Eve, know
that they have public keys B and A, respectively. To generate her key, Alice choses
a random integer a relatively prime with p 1 and computes
A = ga ,
working in the field Fp . A communication protocol is the following:
Bobs encryption operations .
1. Produce a message m encoded as a number of Fp .
2. Retrieve Alices public key A from the public directory
3. Generate a random number e relatively prime with p 1
4. Compute the ephemeral key
c1 = g e ,
5. Compute the masking factor E = Ae
6. Encrypt the message as c2 = m E
7. Send the pair [c1 , c2 ] to Alice.
Alices decryption operations .
1. Compute z = ca
1
2. Compute the message as m = z c2
The algorithm works because of the following chain of identities
z c2 = ca
1 c2
g ea mAe
g ea mg ae = m .
The encryption and decryption operations are performed computing exponentiations and products in Fp , then their complexity is O(log p).
The adversary, Eve, who wants to know the message, retrieves A, p, and g, and
captures c1 and c2 . A possible attack is to find e from c1 by computing a discrete
log, which is an operation of complexity O( p), then to obtain the message as
c2 Ae .
It is an open question whether it is possible to retrieve m without computing a
discrete-log.
- 6.13 -
6.5
- 6.15 -
Chapter 7
Electronic signatures
He that will not apply new remedies must expect
new evils; for time is the greatest innovator.
Francis Bacon
7.1
Introduction
version of them, and acceptance is based on a metric (or distance) between measured and stored parameters, an operation that may lead to two basic kinds of
error:
1. accepting the wrong individual (false recognition)
2. not recognizing the right individual (false alarm).
The consequences of error may be very dangerous in both cases. In order to improve the reliability of biometric authentication systems, and to reduce their tedious study, the Defense Advanced Research Projects Agency (DARPA) launched
a program, the Active Authentication program, to shift the focus during authentication from the password (including rough forms of biometrics) to the person
using the information systems.
The current standard method for validating a users identity for authentication
on information systems requires humans to do something inherently unnatural:
create, remember, and manage long, complex passwords. Moreover, as long as
sessions remain open, typical systems incorporate no mechanisms to verify that
the person who originally authenticated the access is the same person still in control of the keyboard. Unauthorized individuals can thus improperly obtain access
to information system resources if a password is compromised, or if a user does
not exercise adequate vigilance after the initial authentication.
The new approach seeks to address this problem by changing the game, shifting
the focus during authentication from the password to the person using the information system. This shift would mean, to most people, authentication using
biometric sensors. However, some issues affect the way biometrics is currently
used, thus the challenge is to develop software-based biometrics focused more
on authenticating people than on deploying new sensors.
The DARPA program, lasting from 2011 to 2015, is divided into three phases:
Cognitive Fingerprints. It is aimed to find biometric information that does not
require the installation of additional hardware or special sensors, and to
develop ways to capture aspects of cognitive fingerprints (behavioral traits,
writing speed, posture or gestures) through use of computer mouse and
keyboard. A further objective is the continuous authentication of users, or
guests.
Going Mobile. The aim is to develop a prototype solution that integrates all
available biometrics using a new authentication platform suitable for deployment on standard desktops and laptops.
Integration, Testing, and Deployment Plans. This phase considers the development of a framework to integrate multiple biometrics on mobile devices as
well as desktops. Extensive Individual Validation and Verification (IV& V)
and adversarial partner efforts would be employed to ensure the authentication platform itself would not be susceptible to attack or misuse.
- 7.3 -
7.1.1
Remark. In regard to this example, it must be pointed out that the system controlling access to a PC is very weak protection for the secrecy of the data, because
data on the mass storage (Hard Disk) of the computer are not encrypted by default. Therefore the access authentication process only avoids a passive attack on
the data, but it does not protect against an active attack; for instance: dismantle the hard disk from the PC, read the data (make a back-up of the disk) using
another computer, and re-assemble the hard disk on the PC.
7.2
7.2.1
Document
An electronic document to be signed is usually stored encoded in ASCII characters, and for its signature it is necessary to extract a short blocks of symbols,
called digest and denoted DIG, satisfying the following requirements
1. DIG must depend on every symbol of DOC.
2. DIG should easily be computed from DOC, e.g. with a function of low
complexity.
3. DIG changes if even a single symbol of DOC is modified.
4. It should not be possible (or unfeasible) to modify DOC in such a way that
DIG remains unchanged.
The functions used to compute the digest DIG from a document DOC belong to
the class of hash functions. These functions ubiquitous in todays cryptographic
systems, are defined considering two domains of numbers D and G.
- 7.5 -
i = 1, . . . , L .
At the end of the iterations, the last value DL is taken as the digest d.
Note that, in this description, the hash function outputs a single element of the
ring R which is the digest, therefore a large number of messages is mapped into
the same digest d.
Definition 7.3. A hash function H(.) is said to be collision free if it is computationally
unfeasible to find two messages DOC1 and DOC2 such that H(DOC1 ) = H(DOC2 ).
A further condition, which must hold for practical applications, is that the hash
functions should be easily computable in order to avoid annoying delays, or expensive computational burden. An interesting example of hash function for producing the digest H(M ) of a message M is the following:
A message M is encoded into a binary stream M = b1 , . . . , bsm of sm
bits, each block of m consecutive bits may be interpreted either as an
element
F2m or as an element yi of the ring Z2m 1 , i.e.
Pmof the fieldj1
yi = j=1 bm(i1)+j 2
mod 2m 1 is an element of Z2m 1 . Let be
- 7.6 -
i = 1, . . . , s .
Where Place, Date, First and Family name are the usual attributes of a signature,
the document number identifies the document publicly, and the random number
identifies the signature (several signatures of the same document on the same
day, or several copies of the same document), and the Digest uniquely connects
the signature to the document. Note that the digest guarantees both the integrity
and the authenticity of the document. For example, the above plain signature
may be
Paris&25/10/1811&TomPaine&101745&SP800-67&987354
The Digest is computed from DOC using a hash function for producing a single
number d. The hash function must be publicly known, and satisfy two further
conditions
Given a digest d, it should be difficult to create a document with this digest:
the hash function should be extremely difficult to invert.
Given a message DOC with digest d, it should be computationally unfeasible to find a second message DOC1 with the same digest d.
- 7.7 -
7.2.2
The hash functions most widely used today are called SHA (Secure Hash Algorithm) followed by a number. The original SHA-1 is a hash function whose
output is 160 bits. Later versions have longer outputs of 224, 256, 384, or 512 bits,
see [60] for the official USA definition of SHA. The original algorithm is briefly
described in the following Table, omitting the specifics of the mixing operations.
Table 7.1 - The SHA-1 Hash Algorithm
Input document DOC as a binary stream
Break DOC (possibly with additional bits appended) into blocks Bj of 512 bits
Define five specific initial values h0 , h1 , h2 , h3 , h4 each of 32 bits (these hi are specified in the standard)
Initialize the chaining variables Hi = hi
LOOP over the blocks Bj
Break Bj into sixteen subblocks (words) of 32 bits
Create a total of eighty words w0 , w1 , . . . , w79 by rotating the chaining variables H0 , H1 , H2 , H3 , H4
LOOP i = 0, 1, . . . , 79
Set a = H0 , b = H1 , c = H2 , d = H3 , e = H4 ,
Compute f as a logical function of a, b, c, d, e, (i.e. in F2 )
Mix a, b, c, d, e by rotating some of their bits, and permuting them
Add f and wi to a.
END i LOOP
Update chaining variables
H0 H0 + a, H1 H1 + b, H2 H2 + c, H3 H3 + d, H4 H4 + e
END j LOOP
Output H0 |H1 |H2 |H3 |H4
7.3
A signature based on RSA, for generating the authentication A of a plain signature F , exploits the symmetry of the encryption/decryption operations, and the
asymmetry of the encryption/decryption keys of RSA.
Let N = pq be the product of two (large) primes p and q (randomly generated),
and let E be a (large) random number relatively prime with (N ) = (p 1)(q 1).
- 7.8 -
Assume that the signer has a RSA public key [N, E], and that [N, D] is his secret key where D has been computed as a solution of the modular equation
ED = 1 mod (N ). A plain signature F and its authentication A are encoded
in numbers of ZN . The signer uses his secret key to encrypt the plain signature F ,
thus producing the authentication A as
A = F D mod N .
In this case, the electronic signature is the pair of numbers [F, A].
Any individual who wants to verify the signature retrieves the public key [N, E]
of the signer and computes
F = AE mod N ,
if F = F , the signature is accepted as valid, otherwise it is rejected.
7.4
mod N
is considered, where h and U should be chosen in such a way that the equation
has roots in ZN . Rewriting the equation in the form
h2
h
(X + )2 = ( + U F ) ,
2
4
it is seen that it is not restrictive to set h = 0, thus U should be chosen in such a
way that U F be a quadratic residue modulo N , that is U F must be a quadratic
residue modulo p, and a quadratic residue modulo q.
The multiplier U is called padding factor, and may be computed by trial and error,
or deterministically. In this second instance, the procedure is the following:
- 7.9 -
2
F1
F1
F1
p
=
=1 ,
p
p
F1
is a quadratic residue modulo p. Since a similar result holds for q,
thus F1
p
due to the Chinese remainder theorem, a padding factor is
F1
F2
u=
1 +
2 .
p
q
Nevertheless,
thisu cannot be used as padding factor as it is, because when
F2
F1
and
have different signs, the knowledge of u allows one to factor
p
q
N . Therefore, a padding factor is defined as U = R2 u mod N , with R being a
large random number.
The classic Rabin signature of a message m is then a triple (m, U, S), where U is
a padding factor, found either randomly [61] or deterministically, as described
above [23], such that the equation x2 = mU is solvable, and S denotes one of its
roots. Verification is performed by comparing mU with S 2 . Unfortunately, this
efficient mechanism is exposed to easy forgery.
Given a valid signature (m, U, S), a forgery attack computes S 2 or mU , chooses
any message m0 , and computes U 0 as U 0 = S 2 m01 , thus a valid signature is
obtained as (m0 , U 0 , S) without knowing the factorization of N . In the original
proposal [62], a hash function H(.) is used instead of m, and S is a solution of
x2 = H(mU ), but this does not help against the above forgery attack. This weakness is absent in the Rabin-Williams signature (cf. [31, 82]).
The Rabin-Williams signature (cf. [31, 82]), which is limited to pairs of primes,
where one is congruent to 3 and the other to 7 modulo 8, avoids the vulnerability to forgery. The signature is a four-tuple [m, e, f, S], where e {1, 1} and
- 7.10 -
f {1, 2} are chosen to make the quadratic equation ef S 2 = H(m) mod N solvable in S, where H(.) is a convenient hash function. The non-forgeability is based
on the limited set of multipliers e and f . However, the Rabin-Williams scheme
requires the use of two primes respectively congruent to 3 and 7 modulo 8, while
the classic Rabin signature works with every pair of primes. A possible Rabin
signature that avoids forgery and works for every pair of primes was devised in
[51].
Before introducing a similar scheme, we recall some notions and definitions useful to make the presentation more expedite and sounder.
Definition 7.4. A signature of a message m, of the form [m, f1 , f2 , . . . , f` ], is said to be
weakly non-forgeable if it is not feasible for an outsider to derive from it a valid signature
[m0 , f10 , f20 , . . . , f`0 ] for a given message m0 .
Definition 7.5. A signature of a message m, of the form [m, f1 , f2 , . . . , f` ], is said to be
strongly non-forgeable if it is not feasible for an outsider to derive from it a valid signature
[m0 , f10 , f20 , . . . , f`0 ] for some message m0 .
In other words, a Rabin signature [m, f1 , f2 , . . . , f` ] is strongly non-forgeable if
we cannot derive, without knowing the factorization of N , any valid signature
[m,
f1 , f2 , . . . , f` ].
Instead, a Rabin signature [m, f1 , f2 , . . . , f` ] is weakly non-forgeable if we cannot
derive, without knowing the factorization of N , a valid signature for a specified
message m0 .
For example, the Rabin-Williams signature [m, e, f, S] is weakly forgeable if the
hash function is the identity function, i.e. H(u) = u, because we can derive a
valid signature as [r2 m, e, f, rS] for every factor r. But, depending on the hash
function, this signature may be strongly non-forgeable. In the same way, the RSA
signature [m, mD ], where D is the secret counterpart of the public key E, is weakly
forgeable, because we can obtain a valid signature as [rE m, rmD ] for every factor
r.
Rabin signature.
following:
(F + 1)F
p
1 +
7.5
mod p ,
7.6
Blind signature
Blind signature schemes are cryptographic primitives useful in protocols that intend to guarantee the anonymity of the parties [24]. Blind signatures play important roles in e-commerce, e-money, and e-voting procedures. In fact, they were
introduced by Chaum [12] for privacy-related protocols, where signer and message author are different parties.
The blind signature is a form of digital signature in which a message is disguised
before it is signed, while the resulting signature can be publicly verified against
the original message in the manner of a regular digital signature. Formally, a message m is disguised by means of a function d, and then submitted to the signer.
- 7.14 -
The signed message [d(m), f1 , f2 , . . . , f` ] is then made public by the message author, in the form of a valid signed message, as [m, f10 , f20 , . . . , f`0 ].
In principle, a blind Rabin signature is obtained as follows. Let A be the message
author and B be the signer with Rabin public key N , which is the product of two
Blum primes:
1. A wants the message m to be signed by B, without disclosing the message
itself (or part of the message); he thus chooses a random number r and
submits the disguised message r2 m to the signer.
2. The signer B produces the signed message [r2 m, u, S], where S is a root of
x2 = ur2 m, and u is a random padding factor, and sends the signed message
to A.
3. A receives the blindly signed message [r2 m, u, S] and produces [m, u, Sr ], the
signature for the original message.
This simple mechanism may be subject to forgery and other kinds of attacks,
for example the RSA blinding attack, which aims at using the blind signature
protocol to decrypt messages that were encrypted using the public key of the
signer.
Further, [24, Proposition 2] shows that the blind signer cannot use a strongly
non-forgeable signature scheme; nevertheless, the open signed message may be
strongly non-forgeable.
Let H(.) be a hash function used by the message author. Consider the following
process:
Public-key: [N, H(.)]
Disguised message: r2 H(m), where m is the original message to be signed, and
r is a random factor chosen by the author. This message is submitted to the
blind signer.
Blindly signed message: [r2 H(m), F, R3 ], where F = RS is product of two factors, with R a random number chosen by the signer, and S a root of the
quadratic equation x2 = r2 H(m)u, the padding factor u being defined as in
the Rabin signature.
3
R3
r3
4
The verification cost is seven squares and three products, plus the evaluation of a
hash function.
The signature of the original message is strongly non-forgeable, and the blind
signature is not vulnerable to the RSA blinding attack, as proved in [24].
- 7.15 -
7.7
The secret sharing concept offers, in principle, a way to divide the information required to reproduce a secret key among several people, and the secret key may be
reconstructed only if a subset of those people (possibly one or all of them) share
their information.
Note that secret sharing may be seen as a signature with endorsement; all signers should be present to produce a valid signature, i.e. to recover the secret. A
method based on linear algebra was devised by Shamir.
Assume that n individuals want to share a secret that can be reconstructed only
when k of them are present. Suppose that a trusted person, generically called
Warden, is the system designer.
Let m Fq , the secret to be shared, be an element of a finite field. Warden chooses
k 1 random numbers a1 , a2 , . . . , ak1 , in Fq and forms the polynomial pk (x) of
degree k 1
pk (x) = m + a1 x + a2 x2 + + ak1 xk1 .
Note that the secret is the value of pk (x) computed for x = 0.
Warden then generates n random numbers Ui Fq that have the meaning of public identifiers of each individual, and computes n numbers Ki = pk (Ui ) that have
the meaning of individual secret keys, and are kept secret. These Ki s are the passwords or authenticators. Warden then distributes, on private secure channels, the
pairs [Ui , Ki ] to each individual.
When k individuals meet and want to share the secret, they must construct the secret polynomial pk (x), and an efficient method that they may use is the Lagrange
interpolation formula. They form the polynomial
k
Y
Lk (x) =
(x Uji ) ,
i=1
and compute the first derivative L0k (x): the secret polynomial is reconstructed as
pk (x) =
k
X
Kji
0
Lk (Uji )
i=1
Lk (x)
.
x Uji
- 7.16 -
Chapter 8
Complexity
In order for the light to shine so brightly,
the darkness must be present.
F RANCIS B ACON
8.1
Introduction
The notion of complexity is at the root of many areas of engineering and mathematics: two sciences that have different, sometimes conflicting, perspectives from
which to view the concept, and different ways of measuring the difficulties of
problems.
Computational complexity is one of the main topics, if not the principal one, of
computer science. And computational complexity is a fundamental notion indispensable to characterize any cryptographic transformation concerning public key
crypto-systems.
However, the role of computational complexity is also important in the classical
private key encryption schemes, in particular if we want to measure the systems
strength against different types of attacks.
Unfortunately, a satisfactory measure of computational complexity, analogous to
Shannons information measure, is still missing. Therefore, some naive measures
of computational complexity have been considered which have no axiomatic basis, although they are heuristically sound. Further, these measures are useful in
practical applications. The consequences drawn from these practical measures
are acceptable and useful; in particular, cryptographic systems designed using
such measures have shown themselves to be reasonably secure.
From a purely theoretical point of view, many attempts have been made to find an
axiomatic theory of computational complexity measure, and probably the most
successful is due to Andrey Nikolaevich Kolmogorov, a preeminent mathematician of the 20th century. Kolmogorov defined a measure of complexity of a
- 8.1 -
Figure 8.1: Complexity typical trends: logarithmic growth vs. linear growth
8.1.1
a square for every x between 1 and p. We say that the complexity of this ap
proach is exponential, because, a priori, the number O( p) of attempts that we
must make is not a polynomial in the number log2 p of bits used to represents p.
We say that the complexity is polynomial if the number of attempts or bit operations is O(logr2 p), [67].
Starting with Euler and Lagrange, (through Gauss, Mathews, and later mathematicians of the 20th century), a large number of methods have been developed
for computing a representation p = x2 + y 2 , but every method ultimately has
exponential complexity. In 1982, Rene Schoof proposed a method based on the
properties of elliptic curves, which obtained the two-square sum representation
of a prime in polynomial complexity O(log92 p).
- 8.3 -
log2 d
1
1.00
0
10
3.16
3.32
100
10.00
6.64
1000
31.62
9.96
10000
100.00
13.29
106
1000.00 19.93
109 31622.77
29.89
12
6
10
10
39.86
1024
1012
79.73
50
25
10
10
166.09
10100
1050 332.19
In the following sections, a brief description of this state of affairs will be attempted, with no greater scope than providing an introductory sight of problems
that are at the very core of present and future mathematics research.
8.2
As Francis Bacon said with nice irony, This art of ciphering, hath for relative an
art of deciphering, by supposition unprofitable, but as things are, of great use. The
aphorism is not an understatement, because the art of ciphering actually consists
in applying a transformation to a text which must be made incomprehensible to
the adversary, and then applying the inverse transformation in order to read the
message. These transformations must meet some requisites, which were listed
by Bacon himself. These principles are the essence of the age-long progress in
the art of cryptography, and Bacons great merit is that of having collected them
together, ordered them, and expounded them without any obscuring addition.
Bacons achievement appears to be even more remarkable, since his conscious
goal was certainly not to create a scientific discipline.
Let M be the message to be encoded, T the cryptographic transformation and E
the ciphered text:
1. T must be easy to perform, that is it must be easy to obtain E by applying
the transformation T to M, i.e. E = T (M);
2. T must be invertible, that is, given E, it must be always possible to re-obtain
M; this possibility is indicated with the notation M = T 1 (E), even if T is
not, properly speaking, invertible in the algebraic sense;
3. T 1 must be difficult to calculate without additional information (information that is of a confidential nature in practice).
Without confidential information, the function T must be one-way, that is easy
to calculate but difficult to invert. The classic solution, dating back to Caesars
code, consists in making T depend on a parameter, K, called the enciphering
key, which must only be known to legitimate system users. Should an adversary
manage to learn K, then he is said to have broken the code. Nevertheless, an
adversary might, given E, succeed in determining M without knowing or managing to discover K. In this scenario, the above requisites are completed by a
further restriction:
4. Knowing M, E, and even T , it must be hard (it must be computationally
difficult) to determine K.
A system that meets these conditions is known as a cryptographically robust system. The chief reason for introducing K was to make the coding system depend
on a limited quantity of information compared to the quantity of information to
be protected. Obviously this economy introduced weaknesses, which have constituted the perennial conflict between coding scheme designers and attackers,
- 8.5 -
Private channel
K
S
K
E
Public channel
T 1
Information integrity aims at ensuring that the original message is received at the
right time, in the right place, by the intended recipient without distortion or alteration. This feature is sometimes implicit in traditional secret-key communication
systems, although it is a totally independent concept from secrecy.
Information authentication was implicit in the hand-written signature, guaranteeing both origin and non-disclaiming of a message. This property of public key
cryptography allows what, in jargon, is called electronic signature to exist.
While the classic cryptographic functions have also been described in the form of
tricks, all valid solutions concerning public key cryptography are derived from
arithmetic or algebraic properties of numbers.
8.2.1
One-way functions
Two types of one-way function can be considered, which, respectively, characterize classic (i.e. private-key) and modern (i.e. public-key) cryptography.
Let F be a function defined on a domain D = D1 D2 and with values in a domain
R. Let G be its inverse defined on the domain A = R D2 and with values in the
domain D1 . We define:
partial one-way-F: F(., .) is an invertible function of two arguments such that its
image z = F(x, y) is easy to calculate, and the inverse image x = G(z, y)
is easy to calculate knowing y, whereas it is difficult to calculate without
knowing y.
total one-way-F: F(x, y) is an invertible function of two arguments such that the
image z = F(x, y) is easy to calculate, and the inverse image x = G(z, y) is
difficult to calculate even knowing y.
Clearly, these definitions contain the expressions in terms of easy and difficult to calculate, that have not been given explicit values of complexity, and that
might appear ambiguous. Nevertheless, the definition intends to convey the intuitive concepts of easy and difficult. In the examples of one-way functions that
we will give, these notions will be clarified to some extent, using the given definition of arithmetical computational complexity. Anticipating the aspects of easy
and difficult computation as will emerge from the examples, a definition of oneway-F may be
Definition 8.1. An invertible function f from a set D of cardinality N into a set R is
said to be one-way if Cm (f ) = O(lnr N ), and Cm (f 1 ) = O(N s ).
Classical systems, that is to say stream ciphers, DES, AES, etc., and block encryption systems in general, may be described as partial one-way-F(x, y) expressions,
where, substantially, y is the key.
Modern systems, in particular the example of Diffie and Hellman, are models of
total one-way-F(x, y) expressions. All the examples of total one-way functions
are derived from arithmetic or from algebra, and are linked to famous problems
- 8.7 -
to which a solution is known to exist, but can only be computed with great difficulty. Only three classes of arithmetical and algebraic problems have been, and
still are, used as generators of functions that are candidates for total one-way
functions: 1) Discrete logarithm (DL); 2) Factorization (FP); and 3) Searching unsorted data (SUD).
8.3
Arithmetic complexity
Arithmetical operations in a ring R occur in many field areas of modern engineering. In particular, cryptography is based on very elementary operations with
integers, or numbers that can be treated as integers, but that are identified by a
very large number of digits, and the complexity of products or exponentiations
ultimately lies in the huge number of digits.
8.3.1
In any group or ring considered multiplicatively the complexities of two basic elemental operations are of great importance, namely product and exponentiation.
Complexity of product in Z
Let (a, b) denote the function product, namely (a, b) = ab is the product of
two integers. Fast multiplication techniques of integers received much attention
after the appearance of computers. A complete account is found in [47], which
although dated can still be considered the most extensive and reliable reference.
The main conclusion is that, if n is the number of bits representing each factor, the
complexity is of the order of n2 , that is C() = O(n2 ). However, it is possible to
do better: the Karatsuba algorithm performs the multiplication with complexity
C() = O(n ), with = ln 3/ ln 2 = 1.585. Todays best asymptotic result, of
the lower bound is obtained, for example for N = 2n2 + 2n1 , in which it
n1
is clear that n 1 squares will be needed to calculate the power g 2 , and a
further product to obtain N ;
the upper bound is obtained by noting that n 1 squares will be needed to
n1
obtain the powers g 2 , g 4 , . . . , g 2 , and at most n 1 products to calculate
g N from the above powers.
This algorithm for computing powers, also known as LR binary method for exponentiation, already appeared in Legendres book Theorie des Nombres; however, a
very similar algorithm was used by the Egyptians to multiply integers.
Remark 1. The minimum number of squares and products in the domain containing x, sufficient to compute xn , is related to the addition chain for n [47, p.402],
namely a sequence of positive integers aj , beginning with a0 = 1, such that aj is
the sum of two previous elements in the chain, either distinct or not (in which
case the operation is said to be a doubling)
aj = aj1 + aj2
j1 , j2 < j ,
and ending with ar = n. The number r is said to be the length of the addition
chain, and the length of the shortest addition chain for n is denoted `(n). This
minimum `(n) yields the minimum number of products needed for computing
xn . Given n, apparently no general efficient algorithm is known for finding the
shortest chain such that aj = n. Most of the results concern special forms of n, for
example if n = 2A + 2B + 2C , with A > B > C, then `(n) = A + 2. Obviously,
the shortest chains for n = 2m or n = 2m + 1 are well known, and are m and
m + 1, respectively. Many open conjectures related to n = 2m 1 exist, the most
notorious being the Scholz-Brauer conjecture `(2n 1) n 1 `(n), [47, p.459].
Remark 2. The LR binary method for exponentiation can also be conveniently applied to evaluate power X q modulo a polynomial of low degree g(X) = X n +
a1 X n1 + . . . + a0 over any field, preferably finite fields.
8.3.2
8.4
Factorization complexity
The first type of arithmetic problem that may be used to create a one-way function is factoring.
In his Disquisitiones Arithmeticae at item 329 [33, p.396], Gauss recognized the importance of factoring
The problem of distinguishing prime numbers from composite numbers and
of resolving the latter into their prime factors is known to be one of the most
important and useful in arithmetics. It has engaged the industry and the
wisdom of ancient and modern geometers to such an extent that it would be
superfluous to discuss the problem at length. Nevertheless we must confess
that all method that have been proposed thus far are either restricted to very
special cases or are so laborious and prolix that even for numbers that do not
exceed the limits of tables constructed by estimable men, i.e. for numbers that
do not require ingenious methods, they try the patience of even the practiced
calculators.
It is well known that performing the product of two integers is a simple operation,
whereas given the product of integers it is still a difficult operation to determine
the factors. Some one-way functions for cryptography, such as RSA and Rabins
algorithm, have been constructed on this asymmetry. Both schemes are based on
the fact that given N = pq, the product of two prime numbers, it is hard to find
p and q when these numbers are large (bigger than 10100 ). More precisely, these
schemes are well-grounded on related problems which are difficult to solve: 1)
RSA exploits the property that to compute the order (N ) = (p 1)(q 1) of the
multiplicative group in the set of remainders modulo N is equivalent to factorizing N ; 2) Rabins scheme exploits the property that to solve a quadratic equation
x2 = a mod N is equivalent to factorizing N .
The asymptotic law (N ) N/ ln(N ) for the distribution of prime numbers
- 8.10 -
2 ln 2
N
N
3 N
1
,
2 ln (3 N /2) 2 ln ( N /2)
ln N
ln N
thus, in a symmetric
interval
around
N , we find as many primes as there are
between 1 and N .
8.4.1
Factorization in Z
Let (m) denote the function factorization when m = pq is the product of two
primes;if p and q are of the same order of magnitude, then both are of the order of m. The only known deterministic algorithm for computing
(m) is the
exhaustivesearch; it follows that the complexity is of the order of m, that is
C() = O( m) = O(2n ), with n = log2 m.
The following table summarizes the status of deterministic computational complexity for DL (discrete logarithm), and FZ (factoring)
Direct
Inverse
DL CM
C(CM )O(log2 M ) O( M )
FZ
C(ZN )O(log2 N )
ZN
O( N )
C(CM ) and C(ZN ) denote the complexity of the product in CM and ZN , respectively. The conclusion is that the order of magnitude of the complexities in DL
and FZ problems are comparable. It follows that the choice between the two kinds
of one-way function is based on the criteria cost, efficiency of the algorithms for
the underlying arithmetics, and reliability of the implementation.
The most efficient factorization algorithms are not deterministic. They are derived from an idea of Lucass, and are based on properties of elliptic curves.
The complexity of factoring a number N = pq products of two primes in nondeterministic complexity is sub-exponential, that is of the order
O(exp[ko (ln N )1/3 (ln ln N )2/3 ])
8.5
where ko 32 .
Discrete logarithm
1
1
16
1
Q
Q
P
PP
PP
@
@
@
9
16
Q
@
@
@
5
16
13
16
PP
1
Q
Q
Q
Q
@
@
@
@
@
@
3
16
11
7
16
16
15
16
p1
p1
p1
2k2
and 2k2 are relatively prime, thus a generalized Euclide algorithm gives
and such that p1
+ 2k2 = 1
2k2
In conclusion
p1
k
A = A1 = A
.
B=m
A
2 2
k2
= A12
=A
B = B
(1+ p1
k )/2
2 2
p1
2k2
1=
kY
2 1
i=0
- 8.12 -
k 1
where the roots of k2 + (x) = x2 2 + 1 are quadratic nonresidues, and the roots
k 1
of k2 (x) = x2 2 1 are quadratic residues. Since the square root of a Quadratic
NonResidue (QNR) does not exists, k2 + (x) splits into the product of 2k2 2 quadratic
irreducible polynomials x2 b, while k2 (x) splits into the product of 2k2 2
quadratic reducible polynomials x2 b, that is k2 (x) ultimately splits into linear
factors.
In conclusion, the problem of computing the square root of any QR is reduced to
that of computing a square root of a QR in C2k2 . Analogously, a QNR is obtained
as a k2 -th of unity constructing a sequence of primitive roots for every exponent:
Ir =
p
r1
Ir1 = (1)1/2
Ik2
p
p1
p1
2
p1
= (1) 2k2 = 1 .
An interesting description of C2k2 1 is obtained using a tree graph. Group elements are used to label the nodes of a binary tree with 1 as label of the root which
is connected by a single branch to the node labeled with 1; every other node
is binary with one input and two output branches, except the terminal nodes,
which have no output branches. The tree is full, and its height is k2 . If `1 isthe
label of a node, then the extremes of outgoing branches are labeled with `1 ,
see Figure 8.4. Any QNR is the label of a terminal node, while the label of every
non-terminal node is a QR.
8.5.1
Given a cyclic group C of order M and a generator g C, a one-to-one correspondence exists between the elements of C and the elements of ZM , the additive
group of the ring of residues modulo M :
c = g`
` ZM
` = Lg (c)
The exponentiation is a mapping from ZM into C, while the inverse mapping, denoted Lg (c), from C into ZM is said to be the discrete logarithm of c base g.
The function Lg shares all typical properties of the real logarithm, in particular
the logarithm of the product is equal to the sum of the logarithms, the logarithm
of a power is equal to the product of the exponent and the logarithm of the base,
and the logarithm of the cyclic group identity is 0 ZM .
Whereas the exponential g ` can be easily calculated, in general the discrete logarithm is difficult to calculate (or at least it is not known how to calculate it easily).
It is remarked that the difficulty depends on the representation of the elements of
C. An example illustrating this point is obtained considering C = ZM . Let g 6= 1
- 8.13 -
8.5.2
C(DL|pd ) = d C(DL|p)
from which it follows that calculation of the discrete logarithm in CM can be reduced to calculating the discrete logarithm in two cyclic groups of order m1 and
m2 , or in a cyclic group of order p. Proofs of the two properties are given in the
same order.
Let M = m1 m2 be the order of a cyclic group CM with generator g the product
of two relatively prime numbers. The discrete logarithm x = Lg (a) ZM can be
written as x = x1 1 + x2 2 by the Chinese remainder theorem, with x1 Zm1 and
x2 Zm2 . The two multipliers
1 = 1 m2 1 mod m1
2 = 2 m1 1 mod m2
satisfy the following conditions modulo M
12 = 1
22 = 2
1 2 = 0 .
These equations show that the two coefficients x1 = Lg1 (a1 ) and x2 = Lg2 (a2 )
can be obtained as logarithms in two cyclic groups of order m1 and m2 , respectively. In terms of complexity we may write
C(DL|m1 m2 ) = C(DL|m1 ) + C(DL|m2 ) ,
plus an overhead of complexity due to the calculation of a1 and a2 , which is
polynomials in ln M .
Let M = pd be the order of a cyclic group CM , with generator g, power a prime
number p. Let x = Lg (a) ZM be written as an integer base p
x=
d1
X
xi p i .
i=0
d2
) .
8.5.3
Shanks Bound
M log2 M bits ;
The complexity for sorting Tab is not counted because it is performed once;
Each search in sorted Tab requires a maximum of log2 M comparisons;
8.6
The third mathematical topics that may be used to define one-way functions is
searching of data. The technical formulation of the problems requires a preliminary definition.
Let f : A N be a mapping from a finite set into the positive set of integers.
The image f (a) N of a A is called the label of A. Two important searching
problems are
1. Given an object b with label ` = f (b), decide whether b is in A.
2. Given a label ` find a such that f (a) = `.
- 8.17 -
If the objects or data are sorted by labels, the search is trivial, that is, the complexity of problem 1. is O(ln(|A|), while the complexity of problem 2. is O(1).
Otherwise, the complexity of both problems is exactly |A|. A specific instance of
SUD problems useful in cryptography utilizes vector spaces, and is known as the
McEliece scheme, whose description is repeated here.
Let Vn be a vector space of dimension n on a finite field Fq . Let Vn,k be a subspace
of Vn of dimension k. Let wH (v) denote the number of non-null components of a
vector v, i.e. wH (v) is the Hamming weight of v, and defines a metric in Vn .
Let ` be a coset leader in the partition of the group Vn into cosets of the subgroup
Vn,k , the corresponding coset is written as ` + Vn,k . If words of minimum weight
in the cosets are chosen as coset leaders, the partition of Vn into cosets specifies a
minimum distance decoding.
In general, given an element v Vn , it is an NP problem to determine the coset
leader `(v) of the coset to which v belongs [79]. Conversely, let d be the minimum
c, it is easy to generate an element
weight of the coset leaders, and set t = b d1
2
v = c + ` in some coset, where c is an element of Vn,k , and ` is any vector of
weight less than or equal to t. These two contrasting operations characterize a
one-way function based on SUD.
McElieces idea has been to use an easily decodable code C with length n, dimension k, large minimum distance d, and generator matrix G. Then, two secret
matrices are selected, namely, an n n random permutation matrix P, and a nonsingular k k random matrix B. A matrix E is computed as
E = PGB ,
c, the error code correcting
and represents the public key together with t = b d1
2
capability.
For encryption, a message m is encoded in a vector x of dimension n, which is
then transformed as
y = Ex + e ,
where e is a random vector of large weight, not greater than t, kept secret.
Decryption consists of the following steps:
- Compute the vector PT y. This transformation does not alter the number of errors, i.e. the weight of e.
- Compute the syndrome
s = HPT y = HGBx + HPT e = HPT e ,
since the parity check matrix H, of the easy-to-decode C, is known.
= PT e from s,
- Recover the error vector e
because the number of errors falls within the code error correcting capability, and
the code generated by G has an efficient decoding algorithm.
= Bx = y + P
- Compute x
e.
- 8.18 -
.
- Obtain x = B1 x
This standard decoding procedure can only be performed knowing P and B,
which are the secret key. It is an open problem to prove or disprove whether an
efficient decoding is also possible knowing only P, or only B, or neither.
McElieces scheme is one of the few attempts exploiting SUD that is still valid.
- 8.19 -
Chapter 9
ECC
Arithmetic is one of the oldest branches, perhaps the very oldest
branch, of human knowledge, and yet some of its most abstruse
secrets lie close to its tritest truths.
H.J.S. Smith
9.1
Introduction
The acronym ECC stands for Elliptic Curve Cryptosystem, and denotes the class of
cryptographic systems whose key transformations are defined using the abelian
additive point groups of elliptic curves over finite fields. Elliptic curves were
originally defined over the complex field; they are algebraic cubic curves such
that in every point the tangent line is uniquely defined. In more technical terms,
an elliptic curve has no singular point, i.e. there are no points in which the tangent
is not unique, or is not defined at all. A few words to fix the terminology. An
algebraic curve Cf of degree n is identified by a polynomial of degree n in two
variables f (x, y) with coefficient in a field K. Any straight line intersects Cf in at
most n points (exactly n points in a suitable extension of K). The coordinates of
any singular points Qi = (xi , yi ), i = 1, . . . , s of Cf , are obtained by solving the
system
f (x, y)
f (x, y)
= 0,
=0 .
f (x, y) = 0,
x
y
Straight lines through a singular point have multiple intersections with the curve
in that point. By a non-singular curve we mean a curve with no singular points.
An elliptic curve over a field F is a non-singular curve described by a cubic polynomial in two variables with coefficients in a field F
f (x, y) = x3 a0 y 2 a1 xy a3 y + a2 x2 + a4 x + a6 + x2 ya7 + xy 2 a8 + y 3 a9 .
- 9.1 -
AB
A+B
(2, 2)
(2, 1)
(1, 0)
(1, 0)
(0, 0)
(2, 1)
(2, 2)
9.2
An algebraic plane curve is the set of points P (x, y) whose coordinates x and y
(i.e. coordinates in affine planes) satisfy an algebraic equation f (x, y) = 0, where
f (x, y) is a polynomial in x and y. Plane algebraic curves are also defined referring
to homogeneous coordinates X, Y , and Z (i.e. coordinates in projective planes).
The two set of coordinates, projective and affine, are related as
x=
X
Z
y=
Y
.
Z
Therefore, a point P (x, y), can be singled out by any triple (X, Y, Z) with X = xZ
and Y = yZ when Z 6= 0. The triples (X, Y, 0) represent points at infinity. The
triple (0, 0, 0) is excluded.
The most visible difference between the two representations is that homogeneous
coordinates represent points at infinity plainly, while in affine coordinates points
at infinity need a special symbol, like for x and/or y, and must be treated as
exceptional points. Furthermore, homogeneous coordinates offer some computational advantages in ECC.
Definition 9.1. Let F be either an infinite or a finite field. An elliptic curve E over F
consists of a set of points P = (x, y) whose coordinates x and y satisfy a cubic equation
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
- 9.3 -
(9.2)
(9.3)
y 2 + a1 xy = x3 + a4 x + a6
y 2 + a3 y = x 3 + a4 x + a6
9.2.1
Group Law
The points of E[F] form an abelian group (the point at infinity O is the group identity) for a point sum defined exploiting the property that a straight line has exactly three points in common with the curve. Despite its purely algebraic nature,
group law was discovered by Giulio Fagnano during his search for a duplication
law for elliptic integrals [28]. Fagnanos achievement was generalized by Euler
resulting in the group law for this sort of integral.
Composition law may be described in algebraic terms referring to a canonical
form of the elliptic curve equation
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 .
(9.4)
Addition of two different points P1 (x1 , y1 ) and P2 (x2 , y2 ) on the curve written as
P1 (x1 , y1 ) + P2 (x2 , y2 ), and duplication of a single point written as 2P1 (x1 , y1 ) will
be considered separately.
- 9.4 -
Addition. Given P1 (x1 , y1 ) and P2 (x2 , y2 ) on the curve (9.4), let P3 (x3 , y3 ) denote
their sum. To express the coordinates x3 and y3 in terms of x1 , x2 , y1 and y2 , consider the line ` through P1 (x1 , y1 ) and P2 (x2 , y2 ), which has equation
y = m(x x1 ) + y1
with m =
y2 y1
.
x 2 x1
The line ` meets the curve (9.4) in a unique third point Ps (xs , ys ) whose coordinate
xs is found by solving a third degree equation
x3 (m2 + a1 m + a2 )x2 + b4 x + b6 = 0 ,
obtained by substituting the expression for y given by the equation of the line
in (9.4) and rearranging the terms: the coefficients b4 and b6 are deliberately not
reported because they are not necessary in the computation. Since two roots of
the cubic polynomial, namely x1 and x2 , are known, the third root xs may be
simply obtained from the sum of the roots, that is, from the coefficient of x2
x1 + x2 + xs = m2 + a1 m a2 .
We have
xs = m2 + a1 m a2 (x1 + x2 ) and ys = y1 + m(xs x1 ) ,
and the addition point is given by the intersection of the vertical line through
P (xs , ys ) with the elliptic curve (the third point is the so called point at infinity).
Thus, the coordinate x3 is simply equal to xs , whereas the coordinate y3 is the
second root of the equation
y 2 + (a1 x3 + a3 )y = x33 + a2 x23 + a4 x3 + a6 .
We do not need to solve this equation because we already know the root ys . Thus,
from the sum (a1 x3 + a3 ) of the roots, in conclusion we obtain
x3 = m2 + a1 m a2 (x1 + x2 )
.
(9.5)
y3 = (a1 x3 + a3 ) y1 m(x3 x1 )
Duplication. The computation of 2P (x1 , y1 ) uses equations very similar to those
used for addition, the line through two points being replaced by the tangent to
the elliptic curve in P (x1 , y1 ). This tangent has equation
y = m(x x1 ) + y1
3x21 + 2a2 x1 + a4 a1 y1
.
with m =
2y1 + a1 x1 + a3
The tangent meets the curve (9.4) in a unique third point Ps (xs , ys ) whose coordinate xs is found by solving a third-degree equation, and finally we have
xs = m2 + a1 m a2 2x1 and ys = y1 + m(xs x1 ) .
- 9.5 -
AB
C B
C (A + B)
A (C + B)
C +B
A+B
F denotes the algebraic closure of F, that is the field that includes every root of
any polynomial, with coefficients in F and in any number of variables.
9.3
(9.7)
The invariants and j again have the structure given above; in particular, in
fields of characteristic 2 their expressions assume the simpler form
= a51 a3 a4 + a41 a2 a23 + a24 a41 + a43 + a61 a6 + a31 a33
a41
.
x3 = 2 (x1 + x2 )
j=
where
=
y3 = y2 (1 )y1
y1 y2
x1 x2
if P1 6= P2
3x1 + a4
if P1 = P2
2y1
In F2m it is necessary to distinguish between supersingular (j = 0) and non-supersingular curves (j 6= 0). Furthermore, the equations are written in a form
evidencing the minimum number of field arithmetical operations.
Addition formula when j 6= 0.
y 2 + xy = x3 + a2 x2 + a6
and the affine coordinates of P3 = P1 + P2 are
2
y1 + y2
y1 + y2
x =
+
+ (x1 + x2 ) + a2
3
x1 + x2
x1 + x2
if P1 6= P2
y1 + y2
y3 =
(x1 + x3 ) + x3 + y1
x1 + x2
and
a6
x3 = x21 + 2
x1
if P1 = P2 .
y3 = x2 + (x1 + y1 )x3 + x3
1
x1
In homogeneous coordinates, the formulas assume the apparently more complex
form
Z3 = [Z1 Z2 ][X1 Z2 + X2 Z1 ]3
- 9.8 -
and
Z3 = (Z1 X1 )3
Addition formula when j = 0.
if P1 = P2 .
y 2 + a3 y = x 3 + a4 x + a6
and the affine coordinates of P3 = P1 + P2 are
2
y1 + y2
x =
+ (x1 + x2 )
3
x1 + x2
if P1 6= P2
y1 + y2
y3 =
(x1 + x3 ) + y1 + a3
x1 + x2
and
x41 + a24
x =
3
a23
if P1 = P2 .
+
a
x
y3 = 1
(x1 + x3 ) + y1 + a3
a3
In homogeneous coordinates the formulas assume the form
Z3 = Z1 Z2 (X1 Z2 + X2 Z1 )3
and
2
1
a
4
2
2
X3 = X1 + Z1 Z12
a3 a3
3
a4 2
1
a4 2
1
2
2
Y3 = X1 + Z1 + X1 + Z1 + Y1 Z15 + a3 Z16
a3 a3
a3 a3
Z3 = Z16
if P1 6= P2 ,
if P1 = P2 .
Given P E(Fq ) and an integer x, the point xP is defined to be the sum of P with
itself iterated x times
xP = P
| + P + P{z+ + P} .
x
- 9.9 -
The subgroup of Fq -rational points is of interest in several applications. In particular, its order and its structure are important in cryptography.
Theorem 9.1. Let E be an elliptic curve defined over Fpm , where p is a prime. Then two
integers n and k exist such that E(Fq ) is isomorphic to Zn Zk . Furthermore k|n and
k|(pm 1).
Zn denotes the additive set of remainder modulo n, which is the prototype of any
cyclic group of order n.
Hasses theorem gives a good estimation of the order of E[Fq ].
Theorem 9.2 (Hasse).
#E[Fq ] = pm + 1 t,
with
|t| 2 pm .
The following lemma may be useful in the explicit computation of group order
and structure of elliptic curves.
9.4
EC Public-key Schemes
9.5
The adoption of EC schemes, based on the discrete logarithm, instead of Publickey systems based on factoring, closely depends on finite field arithmetics and
the computation complexity of point sums over elliptic curves. The computation
is referred to the canonical form of the equation in F2m .
Actually, xP with x large can be computed using the binary method:
- 9.11 -
q
2
0
2
s
1
3
4
p
1
2
3
i
1
0
1
q
m+1
0
m+1
s
1
3
4
p
1+ blog2 mc
2
3+ blog2 mc
2
1
3
1
4
5
1
2
3
0
0
0
2
1
3
1
4
5
1
2
3
Number of operations in F2m for summing two distinct points in affine coordinates
j 6= 0
x3
y3
Tot.
j=0
x3
y3
Tot.
q
1
0
1
s
5
2
7
p
1
2
3
i
1
0
1
q
m
0
m
s
5
2
7
p
1+ blog2 mc
2
3+ blog2 mc
1
0
1
3
3
6
1
1
2
1
0
1
m
0
m
3
3
6
1+ blog2 mc
1
2+ blog2 mc
- 9.12 -
where q,s, p, and i denote the number of squarings, sums, products and inversions, respectively.
The costs or complexities of squaring, summing, inverting or multiplying in F2m
are different. These costs depend also on the field basis.
Number of operations in F2m for doubling (i.e. computing 2P ) and summing two
distinct points in homogeneous coordinates, respectively.
Doubling
j 6= 0 q s
X3
4 1
Y3
0 4
Z3
1 0
Tot. 5 5
j=0
X3
3 1
Y3
0 3
Z3
1 0
Tot. 4 4
Summing
j 6= 0 q s
X3
0 3
Y3
0 3
Z3
1 2
Tot. 1 8
j=0
X3
1 2
Y3
0 2
Z3
1 1
Tot. 2 5
p
2
3
2
7
3
6
1
10
p
5
3
7
15
3
3
5
11
With homogeneous coordinates, the number of divisions, which are the most expensive basic operations, is drastically reduced, to 2.
Inversion in F2m . The inverse 1 F2m of an element is computed as a power
with positive exponent
1 = 2
m 2
2 +23 +2m1
= 2+2
q
2m + m
s
p
11m 2m(1 + blog2 mc)
m2 + 3m
11m
Homogeneous
q
s
p
7m 1 13m 22m + blog2 mc
j=0
m(5 + blog2 mc)
- 9.13 -
6m 1
9m
21m + blog2 mc
9.6
Historical Notes
The early developments of the theory of elliptic functions introduced the notion
of functions (x) admitting an addition theorem, that is functions satisfying a
functional equation of the form
(u + v) = F ((u), (v),
d(u) d(v)
,
)
du
dv
where F (x1 , x2 , x3 , x4 ) is a rational function of four variables. Elementary examples of functions with an additive algebraic theorem are exponential and trigonometric functions, to which are added the elliptic functions.
1. The exponential function (u) = eu satisfies the addition rule
(u + v) = (u)(v) ,
thus F (x1 , x2 , x3 , x4 ) = x1 x2 .
2. The trigonometric function (u) = sin(u) satisfies the addition rule
(u + v) = (u)
d(u)
d(v)
+ (v)
,
dv
du
thus F (x1 , x2 , x3 , x4 ) = x1 x4 + x2 x3 .
3. The elliptic integral
u=P
(z) =
1
p
dx
4x3 g2 x g3
(9.8)
3
4x g2 xg3
4x3 g2 xg3
R z3
= 3 1
dx
(9.9)
4x g2 xg3
where
1
z3 = z1 z2 +
4
!2
p
p
4z23 g2 z2 g3 4z13 g2 z1 g3
.
z2 z1
1
du
dv
P(u + v) = P(u) P(v) +
,
4 P(v) P(u)
which shows that
1
F (x1 , x2 , x3 , x4 ) = x1 x2 +
4
- 9.14 -
x4 x3
x2 x1
2
.
z = P(u)
y = dP(u)
du
The point P3 = (x3 , y3 ) defined by the coordinates
x = P(u1 + u2 )
3
.
y = dP(u1 + u2 )
3
du
can be viewed as the sum of P1 = (x1 , y1 ) and P2 = (x2 , y2 ). It is straightforward
to verify that P1 , P2 , and P3 = (x3 , y3 ) are collinear.
The addition property of elliptic functions sets up an abelian group structure with
a particular associated rational function F (x1 , x2 , x3 , x4 ). The group properties are
are a direct consequence of the correspondence
P (x3 , y3 ) = P (x1 , y1 ) + P (x2 , y2 ) u3 = u1 + u2 .
This group structure is prolonged to the point set of an elliptic curve. Due to the
algebraic interpretation of the addition law, the additive structure in the point set
of an elliptic curve is maintained also by the curves over finite fields. In this case
the additive groups are finite groups, which provide a large set of abelian groups
with properties that are suitable to define one-way functions.
9.6.1
The origins
The discovery of the additive law may be attributed to the Italian nobleman
Marchese Giulio Carlo Fagnano dei Toschi (1682-1766), and its first academic
organization is due to Leonard Euler, as is clear from J. Dieudonnes review
(MR2000) of R. Ayoubs paper The lemniscate and Fagnanos contributions to elliptic integrals. Arch. Hist. Exact Sci. 29 (1984), no. 2, 131149.
It is well known that in 1751 Euler was asked to examine the collected papers of the Italian nobleman G. Fagnano , and discovered
in them unsuspected relations between special types of elliptic integrals, which led him to create the general theory of these integrals,
culminating in the addition formula
Z u
Z v
Z r
dt
dt
dt
p
p
p
+
=
P (t)
P (t)
P (t)
0
0
0
- 9.15 -
p
p
where P (u) = 1+au2 u4 and r = (u P (u)+v P (v))/(1+u2 v 2 ). The
particular case considered by Fagnano was a = 0; the integral then
expresses the length ofp
an arc of a lemniscate
given by the parametp
2
4
2
ric representation x = (t + t )/2, y = (t t4 )/2. This had been
known since James Bernoulli , and Euler had met the same integral
in his work on elastic curves. The author explains that what Fagnano
did was to prove for that integral the duplication formula, the particular case of the addition formula for v = u. From this and ingenious
changes of variables and geometric considerations, he showed that if
_
- 9.16 -
Chapter 10
Cryptanalysis
The job of the artist is always to deepen the
mystery.
Francis Bacon
10.1
Introduction
the security of cryptosystems, which at that time had not been formally proved
to be mathematically secure, both in the algorithm aspects and in the managing
protocol. Pole breaking of the Enigma system, or the decryption of messages encrypted using Enigma machines at Bletchey Park (UK) during World War II, were
a mix of cryptanalysis of the mathematical algorithm and penetration of protocol
features based on intelligence.
Confining our cryptanalysis considerations to enciphering schemes for information concealment, the goals are that of capturing the secret key, and/or the message content. Cryptanalytical processes are guided by two principles: one is essential to the formulation of the problem itself, and is aimed at disclosing protected information; the second is essential for recognizing that attacked enciphering schemes have been effectively broken. These two principles are formalized in
the set of basic axioms of cryptanalysis.
10.2
Axioms
Three axioms are indispensable for characterizing cryptanalysis. Two axioms are
apparently obvious, but are necessary for correctly defining the operative framework of cryptanalytic actions, which should not be confused with intelligence
activities.
The formulation of the first axiom may seem trivial, but is indispensable for specifying any cryptanalytic attack: it is necessary to know that the message is a meaningful encrypted text, written using some encryption mechanisms in recognizable
symbols.
Axiom 10.1. Given a text T to be cryptanalyzed, then
1. T is written in symbols of a known alphabet.
2. T is an encrypted message.
3. T has been produced from a plain text by encryption mechanisms whose formal and
mathematical descriptions are known.
The second axiom is indispensable for deciding the success of any cryptanalytic
attack. Specifically, in order for cryptanalysis to make sense, it is necessary that,
once a text U is obtained from an encrypted message, the cost for verifying that U
is the true original message is close to zero.
Axiom 10.2. When the cryptanalysis of an encrypted message T produces a message U,
the cost for recognizing whether U is the right message has O(1) complexity.
Remark. It is worth observing, despite its obviousness, that any message may
be encrypted in whatsoever message. Thus, if the set of encryption rules (algorithms) and the set of plain messages are not properly restricted, any operation
of cryptanalysis is meaningless.
In spite of their apparent simplicity, the conditions imposed by these axioms are
- 10.2 -
10.3
Enciphering algorithm A
Secret key K
Encrypted text E = e1 , e2 , . . ., which consists of a sequence of symbols from
an alphabet C.
Attacks may start from different levels of knowledge, ranging from the most favorable conditions for the defender, to the most favorable conditions for the attacker. Further, the objectives of the attack may be different. In particular, we
consider three cases that are of major interest, together with the relative targets:
CTANI. Cipher Text Attack with No Information: in this attack, only the encrypted text E is known; the goal is to find the encrypted message M, without any information about the algorithm A or the secret key K.
CTA. Cipher Text Attack with side information: in this attack, encrypted text E
and algorithm A are known, and the goal is to find the encrypted message
M, and possibly the secret key K.
PTA. Plain Text Attack: in this attack, a pair comprising encrypted text E and
plain message M is known, along with the encryption algorithm A; the
goal is to find the secret key K (to be used in future decryption).
When PTA succeeds, that is when the secret key has been found, we say that
the system has been broken. Since this kind of attack is the most favorable to
the opponents, the designers of symmetric key cryptosystems should project the
encryption schemes to resist PTA for better security. To this end, cryptanalysis is useful to discover hidden bugs, trivial or less trivial weaknesses, that may
have escaped the designers. In this sense, DES cryptanalysis is paradigmatic,
and shows that DES is secure with respect to the size of parameters (key and
test lengths). For this reason it will be described in detail, after describing some
cryptanalysis of historically important enciphering algorithms.
10.3.1
Many famous enciphering schemes, namely, Caesars cipher, Albertis disk cipher, Vigeneres cipher, Hill cipher, and Bacons cipher, are easily broken by a
plain text attack. Therefore their cryptanalysis will only consider the cipher text
attack, but with some side information. That is, the algorithm to be attacked is
assumed to be known along with the language of the message, while the plain
text is assumed to be unknown. Given an encrypted text and no further information, the problem of finding which cipher is used among the various enciphering scheme will not be addressed here. This problem is certainly of the greatest importance to start the cryptanalysis of the message; however, besides trying
cryptanalysis for every possible cipher scheme, there are few methods, based on
- 10.4 -
frequency analysis, for recognizing the encryption algorithm from the encrypted
text.
In the following, we present cryptanalysis examples of cipher-text only attacks
on the following classic encryption algorithms:
A - Caesar cipher with the alphabet not in standard order.
B - Vigenere cipher with the alphabet in standard order.
C - Alberti disk cipher with reference to an Albertis disk adapted to the English
alphabet, and used in the simplest mode; attacks on the more sophisticated
modes also proposed by Alberti would require large encrypted text, and are
not suitable for exemplification.
D - Hill cipher with Hill matrix of dimension 2 or greater than 2.
E - Bacon cipher with either known or unknown partitions of the alphabet, but
with known binary encoding of the alphabet.
A - Caesar cipher. Known plain-text cryptanalysis of a message encrypted with
the Caesar scheme is trivial when the alphabet is taken in its natural order, since
an exhaustive search within 26 possibilities breaks the system.
The cryptanalysis of Caesar systems that use a set of alphabet letters in a permuted order is less trivial. Attacks on these schemes, already in the Renaissance, made use of statistical methods and probabilities (frequencies), and these
probabilistic methods are still today the most powerful tools for attacking cryptographic systems.
We will use a numerical example to examine how to proceed. Assume an alphabet consisting of 31 symbols, that is, the 26 letters of the alphabet plus 5
punctuation marks including blank (space). The letters are encoded with ordered
numbers from 0 to 25, blank is encoded with 26, and the remaining symbols are
encoded with 27, 28, 29, and 30, as shown in Table 10.1 together withthe rela
i
tive frequencies in the English language. A permutation is written as
.
(i)
i
A Caesar permutation of key k is denoted as
, where the number bei+k
low, obtained as an addition modulo 31, is substituted for the number above.
Therefore,
a Caesar permutation performed after permutation is denoted as
i
, which is equivalent to a permutation of the original ordered set,
(i)
+k
i
that is
with (i) = (i) + k mod 31. In this operation mode, k may play
(i)
the role of a key that changes with the message to be encrypted (that is k has the
meaning of communication or message key), it is possibly known (or public, that
is communicated together with the encrypted message) , while the role of common secret key is played by the permutation , which should only be known to
- 10.5 -
by comparing the entries of column 3 of this Table with the entries of the third
column of Table 10.1. When the frequencies are close (possibly the same) then the
letter of column 1 in Table 10.1 is written in the first column of Table 10.2. Using
Table 10.2, it is easy to obtain, after some adjustments, a passage from Il principe
by Niccolo Machiavelli.
B - Vigenere cipher. Cryptanalysis of the Vigenere encryption scheme, when
only the cipher is known, can be performed in two steps:
1. Compute the length `V of the secret key.
2. Obtain the secret key by performing `V frequency cryptanalysis on a breakdown of `V classes, with each class containing the symbols whose positions
differ by `V .
Vigeneres encrypted messages were considered practically unbreakable until the
middle of the nineteenth century, when Frederich W. Kasiski found a method
(non exhaustive analysis) for obtaining the likely length of the secret key: this
key length is true with probability close to 1, or is deterministically exact, depending on the length of the message. Kasiskis idea is based on the observation
of repeated patterns in the encrypted message. If in the encrypted text, equal patterns (usually taken of length 2 or 3) come from the same text pattern, then they
are encrypted with the same key characters: it follows that the distance of the patterns is a multiple of `V . Clearly, in this analysis coincidental occurrences should
be disregarded, and Kasiski also proposed a method for dealing with spurious
(false) coincidences. The method is described using a simple example.
Consider the encrypted message produced with a standard Vigenere table 3131,
where the capital letters have the meaning specified by Tables 10.1 and 10.2:
xrvjy yMxmx bhpim BqrgQ QcaQk Cmcti rvmCM mwmaM Qbefg
chfcQ ivmxe xysuv esbwz mxrej teQim BmhnQ athxq iPmha epzqt
xnrss Pk
Observe that the pattern [xr] repeats at distance 66, the pattern [im] at distance
60, the pattern [Qi] at distance 23, then we have the greatest common divisors
gcd{66, 60} = 6 and gcd{66, 60, 23} = 1, thus, very probably [Qi] is a false coincidence, and `V = 6. The message is short; however, the frequency attack works;
after some attempts, we find the secret key jerome, and the message, an aphorism of Lichtenbergs:
one must judge men not by their opinions but by what their opinions have
made of them.
C - Albertis disk. The cipher-text only cryptanalysis of messages encrypted
with Albertis disk requires approaches that are completely different from standard frequency methods, when the modes of use proposed by Alberti are implemented. Referring to Albertis disk structure of Fig. 4.3, stationary and movable
- 10.7 -
E
Code freq.
a
00
5.856417935
b
01
0.915065302
c
02
1.746942850
d
03
3.161134681
e
04
8.734714250
f
05
1.622161218
g
06
1.688711422
h
07
4.999584061
i
08
5.814824058
j
09
0.116462856
k
10
0.457532651
l
11
2.753514683
m
12
1.256135097
n
13
5.606854671
o
14
6.239081607
p
15
0.873471425
q
16
0.06655020
r
17
3.660261209
s
18
5.257466101
t
19
7.528491806
u
20
2.104650195
v
21
0.856833874
w
22
1.514017137
x
23
0.124781632
y
24
3.105981199
z
25
0.058231428
blank
26
15.97728974
-
27
0.06655020
,
28
1.081440812
.
29
6.738208136
;
30
0.016637550
Table 10.1: Frequency distribution of English language alphabet
- 10.8 -
E
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
B
M
C
F
Q
Code
18
19
16
17
02
28
23
03
04
26
27
09
21
10
20
01
14
11
15
24
30
22
07
06
08
12
13
29
00
25
05
freq.
5.602923264
1.705237515
2.192448234
2.557856273
10.96224117
1.948842875
.9744214373
5.481120585
5.968331303
.1218026797
.1218026797
3.045066991
2.070645554
6.455542022
6.090133983
.8526187576
0.
4.141291108
4.628501827
7.917174178
2.436053593
1.461632156
1.096224117
.1218026797
1.096224117
0.
18.87941535
.1218026797
1.218026797
.6090133983
.1218026797
- 10.9 -
disks are partitioned into 24 sectors: the capital letters on the external fixed (or
Stationary) disk are in lexicographic order, thus we may consider A as initial reference letter of the list ending with four digits; the small letters on the movable
disk are arranged in random order (the letter v is excluded), and constitute the
secret key
A B
d e
C
z
D
l
E
u
F
s
G I
j m
L M
r k
N
b
O
t
P
f
Q R
g i
S
c
T
h
V
n
X
q
Z
x
1 2
a p
3
y
4
o
There are many variants or modes of using the disk; we will consider three
modes which present different levels of cryptanalysis difficulties.
Mode 1. Proposed in Albertis book:
1. the small letters in the internal movable disk are in a random secret order,
and a letter, say d, is considered as reference letter and kept secret
2. the numbers 1, 2, 3, 4 are null, and thus may be randomly inserted in the
plain text before encryption, and must be discarded after decryption
3. Capital letters inserted in the encrypted text indicate the alignment of the
letter d of the movable disk with the given Capital letter of the stationary
disk for decrypting the next group of small letters.
Cryptanalysis is made harder by the presence of genuine random symbols in the
encrypted text, that alter the statistics, further the 24 relative positions of the two
disks require (if all positions are used) a very long encrypted text in order to obtain significant estimations of the frequency of letters. For illustrative purposes,
we assume that only three different positions of the movable disk are used; further assuming that the three capital letters in the encrypted text are A, Q, Z, an
encrypted message look like
Ay1 y2 yb1 Zz1 z2 zb2 Ayb1 +1 y2 yb1 +b3 Qx1 x2 xb4
From the structure of the encrypted message, it is evident that frequency cryptanalysis is possible, although the presence of random symbols (whose frequencies are unknown) makes identification of the characters more difficult. The decryption of short texts cannot be performed in any systematic way; it requires
patient trials and a good deal of luck.
Mode 2. Proposed in Albertis book and considered by Alberti himself the hardest to break:
1. the small letters in the internal movable disk are in a random secret order,
and a letter, say d, is considered as reference letter; it is kept secret, and
initially is juxtaposed to the index letter A.
2. the numbers 1, 2, 3, 4 are encrypted in small letters; when decrypted they
will indicate a change of alphabet (the change of alphabet is not known
to the attacker, and further could be truly random); their encrypted value
indicates the capital letter to be aligned with d.
- 10.10 -
(10.1)
Cryptanalysis of Mode 2 is very difficult and requires many guesses that are not
driven by general rules. The attack methods must be ad-hoc and depend closely
on the message.
Mode 3. Albertis disk is used in a way similar to Vigenere enciphering:
1. the small letters on the internal movable disk are in a random secret order,
and a letter, say d, is considered as reference letter and kept secret
2. the numbers 1, 2, 3, 4 are null, thus may be randomly inserted in the plain
text before encryption, and must be discarded after decryption
3. After h letters are encrypted the movable disk is rotated by k positions, then
again h letters are encrypted. Both h and k are kept secret.
Due to the crude scheme, once h is known, frequency cryptanalysis is possible considering blocks of h symbols at a distance of 24 blocks (i.e. blocks that
are encrypted with the same Vigenere key) as having been encrypted with the
same substitution rule. This unknown value of h can be found either by exhaustive search (attack), since reasonable values of h are restricted to be less than
15,utilizing frequency analysis, or by some variant of Kasiskis technique.
Once h is known, the shift value k is obtained by comparing the substitution rules
of consecutive blocks of h symbols.
Also in this case,the amount of encrypted text necessary (and frequently sufficient) for a systematic frequency attack is around 1000 characters.
In the following we give an example of cryptanalysis of a text encrypted with
a further (simplified) variant of Mode 3. The attack technique is derived from
methods for breaking Vigeneres scheme. The cryptanalysis exploits some weaknesses in the text, and proceeds more or less fortuitously.
Consider the encrypted message reported in Table 10.3 which has been produced
using an Albertis disk with 26 letters which are circularly ordered as follows
ABCDE F GHI 1 KLM N O P QRST U 2W XY Z Stationary disk
rklnp tuzwx ysomq ihf jv dbaceg
Moving disk
(10.2)
The alphabet has been adapted to the English language, thus it is slightly different from that reported in equation (10.1). In particular, the plain text has been
prepared for encryption, making the following modifications:
1) i and j have been converted to a single letter I,
2) u, v have been converted to a single letter V .
- 10.11 -
cwawsstwfjvnpjlfwkp
bkiodalwxvohbsotrlc
ciqjpvzrvaprufppnvq
blrokiovokkogjwrwbh
cmnpcspvvpfwmvzpoqd
bwxvoshrjwkkiodadob
cvqtafwvwmuwawssiqj
bhkhabkiokpashrjram
cvzptqfodsrrjwawjzt
bagotwdcvohltkwcarh
cmuvzpwmnpcspvvpfvq
bywchkwvxphkiwvvaki
cpfjorssspvvpfjlqff
borcabshbqkakioywch
cvrsspvvpfjrkqdpvzpo
Table 10.3: Message encrypted using Albertis disk: Mode 3
3) The numbers 1, 2 have been randomly inserted in the text (in the plain
text alphabet they occupy the positions of the suppressed letters).
4) All blanks have been suppressed
5) All punctuation marks have been deleted.
6) In the initial position of the movable disk the letter r is aligned with A of
the stationary disk.
7) After performing the first 19 letter encryptions, the movable disk is rotated
counterclockwise byf 8 positions.
8) After every 19 letter encryptions, the disk is rotated back or forth by 8 positions, therefore encryption of 19 consecutive letters is performed referring
to two states of the disk. These states are identified as State 1 and State 2.
In spite of the oversimplification compared to Albertis original indications, the
text is too short to permit an immediate frequency attack. Therefore we have
looked for patterns of three letters that likely are present in original plain text.
The article THE is certainly a highly probable pattern in the set of 263 = 17, 576
possible patterns of length three.
Scanning the text reported in Figure 10.3, we find that the pattern kio occurs 5
times, and the pattern vzp occurs 4 times. Due to these occurrences much higher
than 1, a reasonable guess is that both patterns are encryptions of THE, clearly
referred to State 2 and State 1. In this case, the block length (a state duration) is a
number between 16 and 20. It follows that 5 cases should be considered; however,
we will only consider the correct one, namely the duration 19.
- 10.12 -
State 1 5,0,10,3,0
a,b,c,d,e,
r,k,l,n,p,
a,b,c,d,e,
State 2 7,6,4,6,0
a,b,c,d,e,
w,x,y,s,o,
a,b,c,d,e,
12,0,0,2,9
f,g,h,i,j,
t,u,z,w,x,
f,g,h,i,j,
5,2,6,6,5
f,g,h,i,j,
m,q,i,h,f,
f,g,h,i,j,
2,2,5,4,3 22,9,7,12,4
k,l,m,n,o,
p,q,r,s,t,
y,s,o,m,q,
i,h,f,j,v,
k,l,m,n,o,
p,q,r,s,t,
10,2,5,1,12 10,7,8,8,3
k,l,m,n,o,
p,q,r,s,t,
j,v,d,b,a,
c,e,g,r,k,
k,l,m,n,o,
p,q,r,s,t,
3,20,12,0,0,6
u,v,w,x,y,z
d,b,a,c,e,g
u,v,w,x,y,z
1,14,11,1,1,1
u,v,w,x,y,z
l,n,p,t,u,z
u,v,w,x,y,z
in State 1
in State 2
en
en+1
=
a b
c d
xn
xn+1
where every pair (en , en+1 ) of encrypted symbols depends on a single pair (xn , xn+1 )
of plain text symbols, therefore frequency attacks are possible on pairs of symbols, even if the Hill transformation produces a kind of equalization of probabilities. In terms of probabilities we have
p{en = , en+1 = } = p{axn + bxn+1 = , cxn + dxn+1 = }
- 10.14 -
d b
c +a
, xn+1 =
}
ad bc
ad bc
shows that the probabilities (hence the frequencies) of pairs are maintained after
encryption. A frequencies table is constructed containing the number of occurrences of each pair, then the two most frequent pairs, say (y1m , y2m ) and (y3m , y4m ),
are assumed to come from the two most frequent pairs of letters, say (x1 , x2 ) and
(x3 , x4 ), in the language (supposing that the language of the plain text is known).
A system of four linear equations in Fp is then written
17, 8, 5, 19, 29, 23, 24, 16, 3, 28, 19, 2, 12, 13, 0, 17, 14, 16, 20,
21, 26, 17, 1, 19, 15, 3, 16, 27, 25, 22, 6, 2, 16, 30, 8, 3, 0, 17,
2, 20, 14, 20, 20, 19, 4, 15, 16, 13, 24, 16, 16, 18, 8, 25, 29, 23, 18,
14, 25, 22, 6, 2, 16, 30, 8, 3, 0, 17, 24, 28, 0, 15, 14, 9, 17, 5, 8, 27
The message is too short to effectively exploit the laws of large numbers. However, there is a pattern that occurs three times, that is (0, 17), six patterns that
occur twice, and also the pattern (8, 3, 0, 17) occurs twice. Looking at the most
probable pattern (0, 17) may correspond to (t 0 blank 0 ), or (e 0 blank 0 ), thus (8, 3)
may be (e n) or (n e) in the first instance, and (n t) in the second case.
We have the systems
19 a + 26 b = 0
19 c + 26 d = 17
4 a + 13 b = 8
4 c + 13 d = 3
Let us assume ad bc = 1. From the first and third equations we find a = 7
and b = 8, and from the second and fourth equations we find c = 1 and d = 19,
thus we check that these values yield a modular Hill matrix by verifying that
ad bc = 1: actually, we have 7 19 1 8 = 1 mod 31. With the corresponding
Hill matrix
7 8
1 19
the following message is easily obtained
life is the art of drawing sufficient conclusions from insufficient premises.
The method works plainly, requiring few or no guesses, if the encrypted text is
sufficiently long: in our numerical case it contains some 1000 characters.
a=(0,0,0,0,0),b=(0,0,0,0,1),c=(0,0,0,1,0),d=(0,0,0,1,1),e=(0,0,1,0,0),
f=(0,0,1,0,1),g=(0,0,1,1,0),h=(0,0,1,1,1),i=(0,1,0,0,0),j=(0,1,0,0,1),
k=(0,1,0,1,0),l=(0,1,0,1,1),m=(0,1,1,0,0),n=(0,1,1,0,1),o=(0,1,1,1,0),
p=(0,1,1,1,1),q=(1,0,0,0,0),r=(1,0,0,0,1),s=(1,0,0,1,0),t=(1,0,0,1,1),
u=(1,0,1,0,0),v=(1,0,1,0,1),w=(1,0,1,1,0),x=(1,0,1,1,1),y=(1,1,0,0,0), z=(1,1,0,0,1)
blank=(1,1,0,1,0),,=(1,1,0,1,1), .=(1,1,1,0,0),:=(1,1,1,0,1),?=(1,1,1,1,0)
Consider the encrypted message reported in Table 10.7, a cryptanalytic attack
may be based on guessing highly probable patterns, for instance suppose that
the article the occurs in the plain text, then blocks of 15 cipher text letters are sequentially analyzed considering a sliding window of 15 characters and checking
for compatibility with the pattern
10011, 00111, 00100
the compatibility of the partition is checked, supposing that 1 is associated to letters in B1 and 0 is associated to letters of B0 . If there is a compatible solution then
some letters are definitively allocated in B1 and B0 .
The same procedure is repeated for other highly probable patterns, until the partition is practically found.
For example, consider the first 15 letters
qvszq amif g xasgd
10011 0 0111 00100
it is evident that it cannot be the pattern the because the letter s in the first block
of five letters should be 0, while in the third block it should be 1. Then we slide
the window by five characters
amif g xasgd qklab
1 0011 00111 00100
again the letter a should be 1 in the first group of five letters, and it should be 0
in the second group of five letters. Continuining this way, we arrive at a pattern
starting with the letter in position 156
aqhuv mmklp iovdi
10011 0 0111 00100
in which no incompatibility occurs, then we may reasonably assume that such a
pattern corresponds to the article the, and identify some characters of B0 and B1 ,
that is
q, h, m, i, o, d B0 , a, u, v, k, l, p B1
Sliding the window up to position 766, another compatible pattern is found
reqxc zmlgj hichd
10011 0 0111 00100
- 10.17 -
qvszqamifgxasgdqklabpmpdhmbvcytqvvl
nwwfcjxonwjsbjloaunmvrsumbiimceijqb
cndxqkwbyftepzbhcrmtqqrzztvloomzhdp
hbcztjebsjitgboeqdlkachaslyeuuhwcjx
thmqenbenvnuwntaqhuvmmklpiovdirqyou
dhnohcnejmslzidfdqaivkhphoxgocicuay
cztlpiipalskqbebpazammfrhvgyrtdllwq
dgnfzkzeelssjiqvxogbdwmpfwpdmihikqj
ysvtxeryqbtboabgsvsehpianlzhpupjyct
fyecxylxrbfvdneubbxfzztzbhleaeqtfso
gnbpsmxohmzurtkxnwuhbhaakdhzttinnwn
qzexxfnyxkrasnstnuweikucslwzwkzwrzd
agmxqocrrnhspeikimermvtsbofynpbavxz
rqamhktbaypamlyuioakwjjjdnuwuzssykb
wkxpwhacoksohjautvitmosadreexlkahcx
llmudsakciasdwjrpscdbfphwmrvadvbeix
mdvzealhlortktmoaltkityvozorzynwzwa
ntzccmeooqhfmqybnkyvfjzvdynzomsuutl
almgtmchosuimjvuimpsavtkdpiyktlolhi
zohcwqowaweylstrzwrouzovigjmkcnvhuy
ldwfldscnghdwbwojxoxanoxeahokkyxrvs
xcmpyrhmggsihhsyjyximqrohcnhubreqxc
zmlgjhichdurqgzmnblkmmpqzdqzhowmixx
xnzneopqmhoulhkakbgsktinueykaasbgiz
jntfdtahshqxlwvjmzrfchowlzlfvttqbpf
pmazomtbkeummxcmxhwqmcagozxnzgnvtrb
zpgcymtfhlacigemmdyekrmkzoxcsfzmnbz
rsulhvfyxequxrbnotbcimbpkhtjoecombx
nxzgbhkcvqehcipavdndciwgnyiacrqlsdo
olxyazykfdahwlyffpmm
Table 10.7: Encrypted message using Bacons cipher
- 10.18 -
a, u, v, k, l, p, r, x, c, g, j B1
With this information, the encrypted text may be modified, substituting their binary values for the letters, obtaining the following encoded/encrypted message
01s00100f111s100111b101000b11yt0111
nwwf1110nw1sb11011n011s10b00010010b
1n0101wbyft010b0110t00100t110000001
0b10t10bs10t1b0000111101s1y0110w111
t0000nb0n1n1wnt10011001110010010y01
00n001n010s1000f001011010011010111y
10t1100111s10b0b110100f1011y1t011w0
01nf010001ss1001101b0w01fw100000101
ys1t101y0btb01b1s1s00101n1001111y1t
fy011y111bf10n01bb1f00t0b010100tfs0
1nb1s01000011t11nw10b0111000tt0nnwn
00011fny1111snstn1w00111s1w0w10w100
110100111n0s1001000101tsb0fyn1b1110
101001tb1y1101y10011w1110n1w10ssy1b
w111w01101s00111t10t00s101001111011
11010s11101s0w111s10bf10w011101b001
00100110101t1t0011t10ty100010ynw0w1
nt011000000f00ybn1y1f1010yn000s11t1
1101t0100s100111001s11t1010y1t10100
0001w00w1w0y1st10w101001011011n101y
10wf10s1n100wbw011011n01010011y111s
1101y10011s000sy1y10001001n01b10011
0011100100110100nb110010000000w0011
1n0n0010000110111b1s1t0n10y111sb100
1ntf0t10s0011w11001f100w101f1tt0b1f
101000tb1010011010w00111001n01n1t1b
0111y0tf0111010000y011010011sf00nb0
1s1101fy100111bn0tb100b110t100100b1
n101b0111000101110n010w1ny011101s00
011y10y1f010w1yff100
Many patterns of 5 bits may be already converted into letters, the remaining can
be converted by guess work and compatibility, since the message has a semantic
meaning. With very few attempts, it is possible to arrive at the following sentence
of Macchiavellis:
it ought to be remembered that there is nothing more difficult to take in hand,
more perilous to conduct, or more uncertain in its success, than to take the
lead in the introduction of a new order of things.
- 10.19 -
10.4
DES Cryptanalysis
DES cryptanalysis began as soon as the NIST (formerly the NBS) published the
algorithm which was designed by a team at IBM, who also used their secret (private) information about cryptanalysis. In part, this was re-discovered and published many years later. Differential cryptanalysis and linear cryptanalysis, when
the the two methods became publicly known, proved that DES could not be broken by these techniques. The explanation concerning differential cryptanalysis
was given by Don Coppersmith, who stated that such cryptanalysis methods
were known at IBM and used in DES design. Coppersmith also indicated the
way to prove that DES is not, in a technical sense explained below, a group. A
proof that DES cannot be broken by means of attacks based on group properties
was published by Campbell and Wiener in 1992 [9].
56
The set of messages is a vector space F64
2 , and the set of keys is F2 , therefore,
56
there are 2 DES transformations, each identified (parametrized) by a key k; we
denote this dependency as DES[k]. A sequence of transformations on the same
plain message X is written as
DES[kn ](DES[kn1 ](. . . (DES[k1 ](X) . . .) .
If DES is a group for this composition law, then the sequence of n transformations
is equivalent to a single transformation, say DES[ka(n) ]. In this case, an attack
a la Shanks can be mounted, and has complexity 228 (Kaliski, Rivest, and
Sherman). Assuming plain text attack, a pair of plain text X and encrypted text
Y is assumed to be known, and the problem is to find the secret key k such that
Y = DES[k](X).
Attack a la Shanks
If DES is a group G, its order is 256 , then it is a 2-group; it follows that
G has a subgroup H of order 228 . The set T of right coset leaders of H
in G contains 228 elements.
The unknown DES[k] may be written as a composition of two elements
DES[k] = DES[h]DES[t]
DES[h] H, DES[t] T ,
X.
break DES, differential cryptanalysis requires 249 chosen plain texts. As revealed by Coppersmith, one of the team that participated in the DES project,
DES was designed to be resistant to DC.
Linear cryptanalysis was discovered by Mitsuru Matsui, and needs 243 known
plain texts (Matsui, 1993); the method was implemented (Matsui, 1994), and
was the first experimental cryptanalysis of DES to be reported. There is no
evidence that DES was tailored to be resistant to this type of attack.
Davies attack: while linear and differential cryptanalysis are general techniques
and can be applied to a number of schemes, Davies attack is a specialized
technique for DES, first suggested by Donald Davies in the 1980s, and improved by Biham and Biryukov (1997). The most powerful form of the attack requires 250 known plain texts, has a computational complexity of 250 ,
and has a 51% success rate.
Cryptanalytic weaknesses Let en denote the all 1s vector of length n. DES exhibits the complementation property, that is
DES[e56 + k](e64 + X] = e64 + DES[k](X) ,
where the additions are performed modulo 2.
The complementation property means that the work for a brute force attack could
be reduced by a factor of 2 (or a single bit) under a chosen-plain-text assumption.
By definition, this property also applies to the triple DES cipher.
DES has four so-called weak keys. Encryption and decryption of any message
under a weak key k have the same effect (i.e. DES transformation is involutory):
DES[k](DES[k](X)) = X .
DES has also six pairs of semi-weak keys. Encryption with one of the pair of
semi-weak keys operates identically to decryption with the other (this property
would hold for every key, if DES were a group):
DES[k1 ](DES[k2 ](X)) = X .
It is easy enough to avoid the weak and semi-weak keys in an implementation,
by testing for them explicitly. The keys are not in any case really any weaker than
any other keys, as they do not give an attack any advantage.
It is known that the maximum cryptographic security of DES is limited to about
64 bits, even when independently choosing all round subkeys instead of deriving
them from a key, which would otherwise permit a security of 768 = 28 3 bits.
- 10.22 -
10.5
The cryptanalysis of public key cryptosystems has a strictly algebraic basis, and
uses methods proper to number theory, thus the approach is more formal. Nevertheless, the lack of an axiomatic measure of complexity means that definitive
conclusions cannot be drawn about the strength of the algorithms. The following
considerations concern only algorithms based on factoring difficulties, and the
discrete logarithm in finite groups. The problem of searching unsorted sets will
not be considered, mainly because the algorithms based on this problem are not
sufficiently mature, and have not a large number of applications.
The strength of the RSA and Rabin public key algorithms depends on the difficulty of factoring the modulus N , N being the product of two primes. In particular, it is known that:
- To break the Rabin scheme is equivalent to factoring N = pq.
- It is an open question whether to break the RSA scheme is equivalent to
factoring N . Certainly, if we factor N , we break RSA; the converse has not
yet been proved.
To set up an RSA or a Rabin system, i.e. to generate the modulo N , naturally
involves some probabilistic operations, therefore the complexity, strictly speaking, is not deterministic polynomial. The choice of the modulo N = pq must
be random, therefore we must generate two random prime numbers p and q.
1
by
The probability that a random number with n digits is prime is n ln1 10 = 2.3n
the prime number distribution [36]. This means that, with a reasonable number
of attempts, a prime number can be found; obviously, every generated random
number must be tested. The probabilistic primality tests are very efficient and
practical. Nevertheless, from a theoretical point of view, deterministically testing
whether a number is prime is a problem of polynomial complexity with respect
to the number of its digits, a result proved by three Indian researchers, Manindra
Agarwal, Neeraj Kayal, and Nitin Saxena [1].
The strength of the ElGamal and Diffie-Hellman public-key algorithms depends
on the difficulty of computing the discrete logarithm in a finite group, see Chapter
8. The final issue is the computation of the discrete logarithm in a cyclic group of
prime order, and little progress has been made in this direction since the problem
was first formulated.
10.5.1
Factorization
The status of cryptanalysis of the Rabin and RSA algorithms is the same as that
of integer factorization. Based on what is publicly known today, it is possible to
attack numbers of 160 digits, which are the product of two primes of about 80
digits each. Obviously, it is easier to factor numbers of the same size that have
more than two prime factors.
- 10.23 -
The status of the factorization of N = pq, product of two prime numbers, may be
summarized as follows:
It is known that to factor N is equivalent to computing the Euler totient
function (N ), that is computing (N ) = (p 1)(q 1);
It is known that to factor N is equivalent to solving the modular equation
x2 = b mod N , i.e. finding the four roots provided they exist.
The only known deterministic factorization
algorithm
of N is the exhaustive
search, which has complexity O( N ). Therefore N is an upper bound to
the factorization complexity of N .
Lenstras factorization of N = pq is a probabilistic
algorithm, based on ellipp
tic curves, having complexity O(exp( ln N ln(ln N )) , i.e. sub-exponential
complexity.
This factorization method of Lenstras considers point groups of elliptic
curves E[ZN ] over the ring ZN which contains zero-divisors. Since N is
the product of two primes, an elliptic curve E[ZN ] is a direct sum (by the
Chinese remainder theorem) of elliptic curves over fields
E[ZN ] = E[Zp ] E[Zq ] ,
that is, every point PN (x, y) E[ZN ] can be written as
PN (x, y) = P (x1 , y1 ) P (x2 , y2 )
where x = 1 x1 + 2 x2 and y = 1 y1 + 2 y2 , with 1 and 2 the interpolation coefficients of the Chinese remainder theorem. The order |E[ZN ]| is the
product of the orders |E[Zp ]| and |E[Zq ]|. Therefore, we have
AN = |E[ZN ]| = (p + 1 ap )(q + 1 aq )
= N + 1 + p + q ap (q + 1) aq (p + 1) + ap aq ,
where the defeat numbers (that is ap and aq ) satisfy Hasses bound |ap |
2 p and |aq | 2 q. Further, assuming that p and q are of the same order
O( N ), N can be bounded as
3
(10.4)
Initial test: evaluate the discriminant = 4A3 + 27B 2 of EC, and check
whether g = gcd{N, } = 1. If not, and if mod N 6= 0, then N is
factored. This lucky event is rare.
3
If g = 1, choose n such that n! < (4N 4 +6N 2 +4N 4 ), that is, take n
ln N
ln(ln N )
Making the heuristic assumption that randomly chosen elliptic curves have
3
order uniformly distributed in the interval (10.4) of 8N 4 extension, it is possible to estimate the probability of a curve with points of small order existing, and the probability of meeting such a point.
Example. Consider N = 77. The factorization is obviously trivial, but is significant to explain how the method works, and the issues encountered.
Take G = (1, 1) and the elliptic curve of equation y 2 = x3 3x + 3, which contains
G, and has discriminant = 135 = 19 mod 77 relatively prime with 77.
Since the prime factors of N are small, the points on the two curves E[F7 ] and
E[F11 ] may be easily listed, see table.
y 2 = x3 3x + 3
E[F7 ]
E[F11 ]
x y
x
y
0 0
5,6
1 1,6 1 1,10
2 2
4,7
3
3 0
4 4
0
5,6
5 1,6 5
6 6
5,6
7
8
9 1,10
10 4,7
A direct count shows that |E[F7 ]| = 6 and |E[F11 ]| = 16, thus |E[F77 ]| = 96 = 24 3
which is the product of powers of small primes.
The sequences of points nG is reported in the following table
- 10.25 -
y
n
2
3
4
5
6
-
E[F77 ]
= x 3x + 3
y 2 = x3 + 21x + 3
n[1,1]
n
n[1,5]
[75,76]
2
[13,74]
[10,70]
3
[72,68]
[5,71]
4
[18,58]
[50,27]
5
[52,67]
obst.
6
[47, 4]
7
[54,38]
8
[66,17]
9
[11,5]
10
[65,72]
11
[69,73]
12
[8,65]
13
obst.
3
If we try to compute 6(1, 1), as (1, 1) + 5(1, 1), we get an obstruction, thus we find
a factor of 77 as gcd{77, 50 1} = 7. A favorable obstruction occurs with the
computation of nG, when the point G has order a divisor of n modulo p and not
modulo q. When n is also a divisor of the period of G modulo q, the situation
is unproductive and n is not split: it is this asymmetrical behavior that allows
factoring.
The curve y 2 = x3 3x + 3 is lucky because its order contains powers of small
prime factors. The curve y 2 (x3 +21x+3) has order N = 221 = 1317, thus it splits
77 when we attempt to compute 13[1, 5] = [1, 5] + 12[1, 5] = [1, 5] + [8, 65], since 13
is the minimum order of any cyclic subgroup; actually, we have gcd{77, 81} = 7.
When this situation occurs in practice, the curve does not reduce the factorization
complexity with respect to Shanks method.
10.5.2
Discrete logarithms
The complexity of the discrete logarithm problem depends on the group element
representation and on the prime factor decomposition of the group order.
Let CN be a cyclic group of order N and generator g. The discrete logarithm is an
isomorphism from CN into the additive group ZN .
Representation of group elements. Since all cyclic groups of the same order are
isomorphic, the choice of the representation is crucial to obtain sets where the discrete logarithm computation has great complexity (i.e. exponential complexity).
The following properties should be considered in this endeavor:
If CN can be identified with ZN , the discrete logarithm computation is trivial.
The complexity of discrete logarithm computation in multiplicative groups
C2m 1 of finite fields F2m is certainly less than the complexity of discrete loga- 10.26 -
M.
The complexity of the discrete logarithm in CM is not greater than the sum
of the complexities of the discrete logarithms in each subgroup Cpi i (PohligHellmans theorem).
The complexity of the discrete logarithm is maximal when the group order
M is prime.
- 10.27 -
Chapter 11
Cryptography in GSM
Guglielmo Marconi. 1874 - 1937, Italian
physicist, who developed radiotelegraphy and
succeeded in transmitting signals across the
Atlantic (1901): Nobel prize for physics 1909.
11.1
The first radio-mobile systems appeared soon after Guglielmo Marconi invented
the radio, and demonstrated the possibility of sending electrical signals worldwide. A successful application of wireless telegraphy was in ship-to-shore communications, both for positioning, for guidance, and also for distress calls. A
famous case of the use of telegraphy was in the RMS Titanic, when it sank during
its maiden voyage from Southampton (UK) to New York City, on April 14th 1912.
And famous was the distress call
CQD (6 times) DE (this is) MGY (6 times) position 41.44N. 50.24W
where:
- the code CQD means a general call to all vessels, which indicates the vessel
sending is in distress and requires immediate assistance. At the time of the
sinking, the Marconi companys CQD was still in common use, although
it had been officially replaced by the well known SOS.
- MGY was RMS Titanics call sign, where M identifies British vessels, although it came to denote Marconis ground stations, and the remaining two
letters identify the vessel.
However, the first radio-mobile systems, as they are today understood, appeared
in military contexts, and their effectiveness was confirmed by their deployment
during World War II. Certainly, in those troubled times, massive use was made
for air traffic control, and in conducting squadrons in air battles, where the interconnections between planes of the same squadron were possible only via wireless
phones. The war experience was then expended in the enormous development
of air transportation in the last 70 years or so. However, these systems were far
from permitting personal distribution as later occurred for cellular phones. Big
progress in electronic miniaturization has been necessary, a process that started
at the end of World War II with the invention of the transistor.
As soon as the technology was sufficiently developed, that is starting from the
beginning of 1970s, phone cellular systems were introduced in USA, UK, and
Northern European Countries. The number of consumers at the end of the 1970s
was estimated as a few thousand in European countries, while the USA were
ready to take off with 40, 000 users and at least 20, 000 people on the waiting list.
The available systems were denominated
AMPS (Advanced Mobile Phone Service) in the US
NMT (Nordic Mobile Telephone) in North European Countries
TACS (Total Access Communication System) in the UK.
The first Italian public radio-mobile system was introduced, on an experimental
basis, in 1973, by Telecom (formerly SIP). The system, denominated RTM1, was
- 11.2 -
able to supply services of voice conversation and radio news. A major limitation
was the impossibility of receiving calls at the mobile terminal, but only call announcements. Further, the system was designed for a small number of users in
each area served. The VHF working band was located around 160 MHz. This
system had a maximum of 5000 subscribers.
In 1983, RTM1 was replaced by a system denominated RTMS, operating at 450
MHz. In RTMS, it was possible to receive calls, since the mobile terminal was
identified by a number with prefix 0330. At the beginning of 1990s, this system
reached 100, 000 users, when it was replaced by the ETACS system (Extended
TACS), which, as the name shows, was derived from the British system. ETACS
worked at 900 MHz with cellular coverage, and frequency re-use.
ETACS was based on analog FM modulation for sending the voice signal, and
digital FSK modulation for control signaling. Communication was between the
mobile equipment and a base station that served a limited geographical area
called a cell, whose shape is approximately a disc with the base station in the
center.
In ETACS, two fundamental notions, that are indispensable for full mobility were
definitively introduced:
handover that is the automatic management of call changeover that occurs when
the user crosses the border between two cells, without a loss of the conversation;
localization with a continuous updating of the position of the mobile equipment
(cell phone) in the territory partitioned into cells.
11.2
GSM
GSM, acronym for Global System for Mobile communications, was developed with
the aim of defining a common European standard for mobile communications.
The chosen technology was TDMA/FDMA/FDD with a radio link at a communication rate of 270 kbits/sec in a frequency band allocated around 900 MHz. The
modulation used was GSMK (Gaussian Minimum Shift Keying) with a separation of 200 kHz between adjacent channels. The voice is digitally encoded by a
linear predictive method (LPC). The source rate is 13 kbits/sec, increased to a
transmission rate of 22, 8 kbits/sec to take care of error correcting and error detecting codes.
11.2.1
Origins
The origins of GSM are more chronicle than history, mainly thanks to the disputes, certainly not of a technical nature, pertaining to the definition of the standard itself. The adoption of one technology rather than another meant making
the fortune of a firm, or precipitating its decline. In June 1982, at a meeting of
CEPT in Vienna, it was decided to set up a working group with the mission of
- 11.3 -
defining the technical characteristics for a pan-European system of cellular communications, in a band around 900 MHz. Actually, such a band had already been
specified in 1978, when it was decided to reserve two half-bandwidths of 25 MHz,
around the central frequency of 900 MHz, for mobile communications in Europe.
Year
Event
1982
1986
1987
1989
1990
1991
1992
The work of the GSM group lasted for about ten years and followed the model of
CCITT recommendations. The work plan was split into four work packages:
WP1 devoted to defining services;
WP2 devoted to specification of radio transmissions;
WP3 devoted to specification of network architecture, signaling protocols and
open interfaces between different networks;
WP4 devoted to defining the services of data transmission.
The conclusions were contained in a report of more than 5000 pages, and the effects were the extraordinary boom in European mobile telephony.
The system offers basic services of telephony and digital data transmission, besides a multitude of possibilities that are of use to users, but that are manly of a
promotional commercial character.
telephoning with the possibility of making and receiving calls
roaming throughout Europe, i.e. the possibility of calling and receiving
when traveling in different countries
identification of the caller
- 11.4 -
11.2.2
Communication aspects
GSM is a full digital communication system. The advantages are many, in particular the security aspects are more easily managed, as are the quality aspects
and number of services offered. In particular, the digital character allows error
control codes to be applied that permit quality data transmission.
GSM transmission is by packets (called BURSTs) of data; to each packet is reserved an interval or SLOT of time; groups of packets are hierarchically organized
in FRAMEs. The SLOT basis is constituted of 156,25 bits, and transmitted at 270
kbits/sec. The structure of a SLOT depends on the function of the group of bits
that compose the BURST. The normal BURST structure is
ST Data H
SP
where
ST: a group of 3 bits of Start
Data: a group of 57 bits of useful data
H: a group of 1 bit used for distinguishing voice from data
T: a group of 26 bits for training
SP: a group of 3 bits of Stop
G: a group of 8,25 bits for defining a temporal guard interval between BURSTs.
- 11.5 -
Voice encoding. The voice is converted to digital by the LPC (Linear Prediction
Coding) technique, which considers slots of 20 msec of speech. It is known that
the alteration (due to disturbance or noise during the transmission) of bits in this
kind of encoding has different effects on the final reproduction of the voice, which
depend on the meaning of the bit corrupted. Therefore, different strengths of
protection are necessary to combine economy of transmission and final quality of
voice. The strategy is to partition into classes the bits corresponding to a frame of
20 msec. Thus, 50 bits of the most sensitive classes are protected with a CRC code
(which adds 3 bits of parity-check control to 50 bits); a convolutional code with
rate 1/2 encodes the previous 53 bits together with the remaining 182 bits.
The sampling process produces 160 samples for a total of 2, 080 bits every 20 msec.
LPC compression reduces the total number to 260 bits, of which 182 are encoded
and the remaining 78 are transmitted interleaved to these, at a rate 22.8 kbits/sec
in the Data field of the Normal BURST.
11.2.3
then MS receives from the provider, via the entry BS, a random number RAND.
MS, utilizing the algorithm A3, which realizes a one-way function, computes
RES = A3(RAN D, KI ) ,
where KI is a private key of user authentication, known only to the user and
the provider. This secret key is stored in the SIM card in a protected way, and also
stored in the service center of the network provider.
The result RES is sent back to the provider service center for a challenge, that is
the center does the same computation and obtains res (this is possible because the
Service center knows all parameters). If res = RES, then the service center sends
back an ack, so that the user is enrolled in the network. Of course, if RES 6= res
access is denied. The authentication process, if successful, also provides BS of the
(secret) synchronization key Kc (called communication key) for the stream cipher
that encrypts messages in the air. The BS must decrypt the received encrypted
messages from MS, because the message (voice or data) must be clear when it
enters the ground wired network.
This communication key Kc is generated by the algorithm A8, which implements
a one-way function that combines RAND (known to all) and KI (secret user identifier)
Kc = A8(RAN D, KI ) .
This key Kc need not be communicated to MS, because he has all information,
namely RAND, his secret identifier number KI , and the algorithm A8 implemented on the cell phone, that is required for its computation.
Confidentiality of user information. In the air, the confidentiality of user information is achieved by encrypting the bit stream by means of a stream cipher,
which is driven by the secret communication key A8. The stream generator produces a binary stream (the encryption key) which is summed bit by bit with the
bits of the digitalized message, before the modulator. The key stream is produced by an algorithm denoted A5. The structure of the generator implementing
A5, shown in Figure 11.1, consists of 3 LFSR with a clock control. Generator polynomials and positions of the control clock bit for each LFSR are shown in the
following Table
LFSR
length
19
22
23
Polynomial
generator
x19 + x5 + x2 + x + 1
x22 + x + 1
x23 + x7 + x2 + x + 1
Control
clock bit
8
10
10
The stream generator is initialized at each BURST, and every LFSR is feed according to a defined strategy. At each step, the clock controlled LFSRs advances
according to a majority rule:
- 11.7 -
18
LFSR - 19
0
V OICE LPC
21
LFSR - 22
10
E NCRYPTED V OICE
22
LFSR - 23
10
- 11.8 -
11.3
Conclusions
The choice of the algorithms A3, A8, and A5 has been at the center of numerous
disagreements over security. It is frequently claimed that A5 may be easily broken (actually, there are at least two similar algorithms, denoted A5/1 and A5/2).
However, assuming that such a protection is active to conceal all conversations
flying in the air, there is no serious motivation for using stronger protection mechanisms, given that at every base station the voice is decrypted and goes in clear
on the wires. The fortune of GSM, with respect to the previous generations TACS,
ETACS etc., has been due to the possibility of transmitting data, to the digitalization of the voice, and most importantly to the sufficient strength of the algorithm
A3, and the challenging protocol for accessing (or being enrolled) in the network,
when the cell phone is switched on. Even if the system is breakable with a brute
force attack, without intelligence or side information, does the price pay off? In
the most fortunate cases, with todays technology, it takes days to break the system, a lot of resources, and very high costs of specialized personnel.
The Committee decided not to publish the algorithm A3, motivated by obscure
reasons of security. In the book The GSM System for Mobile Communications by
M. Mouly and M.B. Pautet, at page 478 we read:
... The algorithm describing the computation is referred to as algorithm A3 in the Specifications but its specification cannot be found
there.
... Such algorithms are usually kept secret.
However, it is clear that the reasons were more political than technical. The validity of an algorithm actually lies in its public availability rather than in its secrecy.
The adage Secret of three, secret of everybody is absolutely true, but the choices
of the big organizations cannot be discussed, and must only be accepted. About
the strength of these algorithms, there have been many disputes and claims that
the algorithms could be easily broken. Nevertheless, the real issue concerning
the weakness of these algorithms is not their intrinsic quality, as history teaches.
The issue lies in the protocols used in their deployment; these protocols generally
have some very, very weak point and may contain hidden bugs. For massive applications at very low level, the mathematical robustness of the algorithms is the
- 11.9 -
last target.
The adoption of relatively weak algorithms in GSM is supported by several reasons:
The algorithms should mainly play a deterrent role.
The algorithms should be easy to implement and should not be energyconsuming.
The algorithms should be breakable by national security agencies, in view
of the principle that the security of the individual citizen should not affect
the security of the nation.
The algorithm A5 must be universal, independent of the provider or nation,
because it must run on all cell phones of all manufacturers, and all base
stations in the world.
Meanwhile, the general philosophy is wise enough to permit an independent
choice of the algorithm A3 (authentication algorithm) by the provider, who may
adopt a proprietary algorithm as strong as he likes, in order to protect his economic interests. However, experience has shown that the A3 algorithm, together
with the authentication protocol is sufficiently strong to guarantee the success of
GSM: it has lead to its world dominance, a domination that is indisputable and
above any controversy.
GSM is a dominating phone system, but mobile phones have undergone enormous evolution, up to smart-phones, which practically are computers oriented to
exploit the potentialities of the Internet; their security is essentially based on the
security of the Internet and its providers.
- 11.10 -
Chapter 12
Steganography
Imagination was given to man to compensate
him for what he is not; a sense of humor to console him for what he is.
Francis Bacon
12.1
Introduction
pg , turns out to be a function of the primary code, the bit error probability of the
primary channel, and the principle adopted for identifying the stega-bits. Section
12.4 analyses a simple yet concrete implementation of this scheme, evaluates performance, and discusses some critical cryptographic issues. Lastly, Section 12.5
draws some conclusions.
12.2
Herodotus (c. 486-425 BC) in his Historiae [37] gives several examples of encryption messages, and two particular examples of steganography methods.
Histiaeus, tyrant of Miletus, wanted to send a message from Susa (Persia) to instigate a revolt in Ionia, against the Persians, in order to be sent to stop the rebellion
and thus acquire merit with the king (a typical political game in any age). Of
course the message must be kept secret from the Persian king. Histiaeus shaved
the head of his most trusted slave and tattooed it with the message, which disappeared after the hair had regrown. Apparently [43, 44] the method was also used
by some Germans at the beginning of the 20th century.
The second example of steganography given by Herodotus concerns Demaratus,
a Greek king of Sparta who was deposed and exiled. He went to the court of the
Persian king Darius. However, Herodotus credits Demaratus of having warned
the Spartans of an imminent invasion by Xerses, Darius successor. To send his
secret message, he removed the wax from a writing tablet, wrote his message on
the wood, and restored the wax: the tablet looked like a blank one. The legend
continues, telling that it was king Leonidas wife who discovered the trick and
decrypted Demaratus message, allowing the Spartans to stop the great Persian
army at the Thermophylae, for long enough for the remaining Greeks to organize
an army that defeated Xerses.
The general idea is to embed the secret information (stega-message) in other information that is innocent (dummy) as concerns to the private stega-message. An
old technique was to mark, with almost invisible signs, those letters of text which
composed the stega-message. This trick may be easily discovered, especially if
the stega-text is not properly encoded; however, it has been in use for centuries.
A further step forward was to use invisible ink to mark the letters, or to write
invisible messages in the blank spaces of innocent or faked messages. There are
a very many types of ink:
1. Inks that can be developed by chemical reactions.
2. Inks visible under ultraviolet light.
3. Inks developed by heat.
Todays uses of this sort of expedient include anti-counterfeiting, property marking, childrens games, and marking for the purpose of identification in manufacturing. UV markers, with fluorescent ink that glows when illuminated with a
UV light, may be used to invisibly mark valuable items as a safeguard against
- 12.2 -
burglary. Security marker pens of this sort can be obtained commercially and are
widely used as a crime countermeasure.
A method for hiding information in texts, without any physical alteration of the
paper is the acrostic. There are many acrostic poems. Probably the first was
due to Boccaccio (1300), and probably L amorosa Visione is still the most gigantic
acrostic: written in terzine, the initial letters of all the triplets throughout the work
compose three poems of considerable length. An example is the short poem An
acrostic by Edgar Allan Poe (1829)
Elizabeth it is in vain you say
Love not thou sayest it in so sweet a way:
In vain those words from thee or L. E. L.
Zantippes talents had enforced so well:
Ah! if that language from thy heart arise,
Breathe it less gently forth and veil thine eyes.
Endymion, recollect, when Luna tried
To cure his love was cured of all beside
His folly pride and passion for he died.
The initials of every verse form the word ELIZABETH. Obviously, the position
of the acrostic, when used to hide information, must not be so easily identified,
a point already realized and in use in ancient China. A more faithful example
from the literature comes from Hypnerotomachia Poliphili (1499), [44], written by
an anonymous author. An acrostic formed by the first letters of the 38 chapters
reveals (in Latin) an affair between a monk and a woman:
Poliam frater Franciscus Columna peramavit.
In the 16th and 17th centuries, a large amount of literature on steganography appeared. Trithemius occupies a prominent position: he wrote an extensive treatise, Polygraphiae, printed in 1606, dealing with occultism, cryptography, and
steganography. In the book he proposes an extremely complex steganographic
code, which was expanded by Gaspar Schottus, a Jesuit, in his Schola Steganographica (1665).
Human fantasy knows no limit when it come to devising ways to insert hidden
information, in pursuit of the strongest goals, from vendors, who want to control
resale, to copyright protection, from lovers messages, to spying on sensitive information. However, in modern cryptography, mathematical models, and what
can and cannot be achieved only via information manipulation, tend to dominate the scene. In this mathematical world, we have private-key and public-key
steganography. An example of private-key steganography that can be used in
telecommunication systems will be the object of the remaining part of this chapter. We close this section by saying that, informally, a public-key steganography
scheme is intended to allow two parties (Alice and Bob), who have not exchanged
a secret key, to send hidden messages over a public channel, so that an adversary
(Eve) cannot even detect that these hidden messages are being sent.
- 12.3 -
12.3
Stega-channel
K
S
K
E
Public BSC
T 1
Figure 12.1: Communication Channel with Error-Correcting Codes and the Stegachannel
2. Cover channel bit error probability p
3. The decoding rule D.
4. The detection criteria in Mode 2.
The bit error probability pg is thus defined as follows
pg =
X XX
p{
x 6= x|c, e, x}pt (x)p(c)p(e)
x=0,1 cC eFn
2
where
- pt (x) is the probability of sending a stega-bit x;
- p(c) is the probability of sending a code word c;
- p(e) = pwH (e) (1 p)nwH (e) is the probability that an error pattern e of Hamming weight wH (e) occurs;
- x is the stega-bit detected.
Let Li denote the detection rule of Mode i that extracts from a detected error
the stega bit x, thus we may rewrite the expression for pg as
pattern e
pg =
X XX
x=0,1 cC eFn
2
Letting `(e) denote the coset leader of the coset containing the error pattern e, we
have
pg =
X XX
x=0,1 cC eFn
2
- 12.5 -
In particular, referring to Mode 1, the average on the transmitted code words can
be computed and the equation simplifies to
X X
pg =
p{L1 (e + `(e)) 6= x|e, x}pt (x)p(e) .
x=0,1 eFn
2
1 X
pg (c) .
2k cC
(12.1)
12.4
Concealment issues
If the channel noise is negligible, no genuine error occurs and the stega-bits are
easily recovered, but conversely the stega-channel is easily detected. Therefore,
- 12.6 -
12.4.1
(that is a translation of the repetition code) so that the all-zeros and the all-ones
code words are excluded, and transmission of a stega-bit 0 or 1 is possible without
exception. Moreover, we know that pe and pc do not depend on which stegabit is sent, but both probabilities depend on the code word used, thus they are
computed referring to a 0 stega-bit sent. Moreover, in this case some undecidable
situations occur, which may be seen as erasures, thus also an erasure probability
pu is computed. The standard array is
leaders 000
100
011
010
110
001
001
101
010
100
000 ;
111
the words in each row are decoded into the code word in the first position of the
row.
1. If (1, 0, 0) is the code word used, and the 0 stega-bit is sent using the second
entry, the eight possible situations corresponding to genuine error patterns
are reported in the following table, where column I contains the genuine
error patterns, column II the received words, column III the decoded code
word, column IV the error position and the type of error with E meaning
erroneous stega-bit, U erased stega-bit, and C correct stega-bit, and column
V contains the probability of the event
I
000
111
100
011
110
001
101
010
II
110
001
010
101
000
111
011
100
III
100
011
011
100
100
011
011
100
- 12.8 -
IV
2C
2C
3C
3C
1E
1E
0U
0U
V
p3
(1 p)3
p(1 p)2
p2 (1 p)
p2 (1 p)
p(1 p)2
p2 (1 p)
p(1 p)2
2. If (0, 1, 1) is the code word used, the configurations are summarized in the
following table as in the previous case
I
000
111
100
011
110
001
101
010
II
111
000
011
100
001
110
010
101
III
011
100
011
100
011
100
011
100
IV
1C
1C
0U
0U
2E
2E
3E
3E
V
p3
(1 p)3
p(1 p)2
p2 (1 p)
p2 (1 p)
p(1 p)2
p2 (1 p)
p(1 p)2
Summarizing, we have
pe = 21 (3p(1 p)2 + 3p2 (1 p))
pc = 21 (2p3 + 2(1 p)3 + p(1 p)2 + p2 (1 p))
pu = 21 (2p(1 p)2 + 2p2 (1 p))
The probability pg is obtained splitting pu between pe and pc , and is
pg = 2p(1 p) .
Example 2. Consider a cover channel employing a BCH (31, 16, 7) code correcting three errors. In this case, the exact computation is unfeasible, and therefore
we consider the approximate computation and compare the results with numerical simulations.
Mode 1: For computational purposes, it is not restrictive to assume that the
steganographic bit is sent using the first position in a code word. In this mode,
the computation of pe and pc depends on whether the stega-bit is 0 or 1, which
are assumed to be sent with the same probability 1/2. We have
1. If 0 is sent, then pe (0) is the probability that an error affects the position of
the stega-bit and the total number of errors is at most three
pe (0) = p(1 p)30 + 30p2 (1 p)29 +
30 29 3
p (1 p)28 .
2
If 1 is sent, pe (1) is equal to the probability that one error affects the position
of the stega-bit and at most three errors affect the other positions within the
code word
pe (1) = p(1 p)30 + 30p2 (1 p)29 +
30 29 3
30 29 28 4
p (1 p)28 +
p (1 p)27 .
2
6
- 12.9 -
2. If 0 is sent, pc (0) is equal to the probability that at most three errors affect
positions within the code word, excluding the stega-bit position
pc (0) = (1 p)31 + 30p(1 p)30 +
30 29 28 3
30 29 2
p (1 p)29 +
p (1 p)28 .
2
6
If 1 is sent, pc (1) is the probability that at most two error hit positions within
the code word, excluding the stega-bit position
pc (1) = (1 p)31 + 30p(1 p)30 +
30 29 2
p (1 p)29 .
2
Summarizing, we have
p3 (1 p)28 + 302928
p4 (1 p)27
pe = p(1 p)30 + 30p2 (1 p)29 + 3029
2
12
pc = (1 p)31 + 30p(1 p)30 + 3029
p2 (1 p)29 + 302928
p3 (1 p)28 .
2
12
thus, the bit error probability pg of the BSC model for Mode 1 is
pg =
1 1
29
405 2
1595 3
(1p)31 p(1p)30
p (1p)29
p (1p)28 +1015p4 (1p)27 .
2 2
2
2
2
Mode 2.
where Wyy,xx (1, 1)0 is the multiple partial derivative of W (x, y) with respect to
the subset of subscript variables evaluated for x = y = 1.
The approximated error probabilities of the BSC, modeling a stega-channel with
Mode 2, is
pg =
29
403 2
29
405 4
1 1
(1 p)31 p(1 p)30
p (1 p)29 + p3 (1 p)28 +
p (1 p)27
2 2
4
4
4
4
xar
e
e
e
e
e
e
e
12.5
Conclusions
Error-correcting codes over communication channels may be used as cover objects for secret conversations. The steganographic information is sent as artificial
errors on the channel, and it is undetectable provided that the artificial errors do
not affect the rate of the genuine errors, in the sense that the artificial error rate is
undistinguishable from random variations of the genuine error rate.
An interesting observation is that the performance of the stega-system principally
depends on the primary code error correcting capabilities, rather than on the code
itself, and it is also to some extent independent of the code rate.
Two modes of inserting artificial errors, denoted Mode 1 and Mode 2, have been
considered and compared. Although Mode 1 is incomparably better then Mode
2, because of the achievable pg , Mode 2 does not need synchronization, and it
seems to be less easily detectable.
- 12.11 -
Bibliography
[1] M. Agrawal, N. Kayal, N. Saxena, Primes is in P, Ann. Math. 160, 2004, 781793.
[2] E. Bach, J. Shallit, Algorithmic Number Theory, vol.1, MIT Press, Cambridge
(MS), 1996.
[3] G.V. Bard, Algebraic Cryptanalysis, Springer, New York, 2009.
[4] E. R. Berlekamp, Algebraic Coding Theory, McGraw-Hill, New York, 1968.
[5] I. Blake, G. Seroussi, N. Smart, Elliptic Curves in Cryptography, Cambridge
Univ. Press, Cambridge, 1999.
[6] I. Blake, G. Seroussi, N. Smart, Advances in Elliptic Curves Cryptography, Cambridge University Press, Cambridge, 2005.
[7] A. Borodin, I. Munro, The Computational Complexity of Algebraic and Numeric
Problems, Elsiever, New York, 1975.
[8] J.A. Buchmann, Introduction to Cryptography, Springer, New York, 2000.
[9] K. W. Campbell, M. J. Wiener, DES is not a Group, Proceeding CRYPTO 92,
Proceedings of the 12th Annual International Cryptology Conference on Advances
in Cryptology, Springer, London, UK, 1993, 512-520.
[10] J.W.S. Cassels, Lectures on Elliptic Curves, Cambridge Univ. Press, Cambridge,
1991.
[11] K. Cattel, J.C. Muzio, Analysis of One-Dimensional Linear Hybrid Cellular
Automata over Fq , IEEE Trans. on Computer, Vol. 45, N.7, July 1996, 782-792.
[12] D. Chaum, Blind signatures for untraceable payments, Advances in Cryptology Proceedings of Crypto 82, Plenum, 3, 1983, 199-203.
[13] C. Christensen, Polish Mathematicians Finding Patterns in Enigma Messages, Mathematics Magazine, Vol 80, N.4, October 2007, 247-273.
[14] F. Chun-I, W. Lin-Chuan, W. V. Shi-Ming, Cryptanalysis on Chen-Qui-Zheng
Blind Signature Scheme, Applied Math. Sciences, Vol. 8(16), 2008, 787-791.
- Bib.1 -
[33] C.F. Gauss, Disquisitiones Arithmeticae, Gottingen
(1801); English translation
by A. Clarke, revised by W. Waterhouse, Springer, New Uork, 1986.
[34] S. W. Golomb, Shift Register Sequences, Aegean Park Press, Laguna Hills,
1982.
[35] U. Hansmann, M.S. Nicklaus, T. Schack, F. Seliger, Smart Card Application
Development Using Java, Springer, New York, 1999.
[36] G.H. Hardy, E.M. Wright, An Introduction to the Theory of Numbers, Oxford
Univ. Press, Oxford, 1971.
[37] Herodotus, The Histories (Historiae), Penguin Books, London, 2003.
[38] L. S. Hill, Cryptography in an Algebraic Alphabet, The American Mathematical Monthly, vol. 36, June-July 1929, 306-312.
[39] J. Hoffstein, J. Pipher, J.H. Silverman, An introduction to mathematical cryptography, Springer, New York, 2008.
[40] R.A. Horn, C.R. Johnson, Matrix analysis, Cambridge Univ. Press, Cambridge, 1999.
[41] Hua, L. K., Introduction to Number Theory, Springer, New York, 1982.
[42] D. Jungnickel, Finite Fields - Structure and Arithmetics, Wissenschaftsverlag,
Mannheim, 1993.
[43] D. Kahn, The Codebreakers, The Story of Secret writing, Scribner, New York,
1996.
[44] S. Katzenbeisser, F. A. P. Petitcolas, Information Hiding, techniques for staganography and digital watermarking, Artech House, Boston, 2000.
[45] B. Kedem, Binary Time Series, Marcel Dekker, New York, 1980.
[46] D.E. Knuth, The Art of Computer Programming, Fundamental algorithms, vol.
I, Addison-Wesley, Reading (MS), 1967.
[47] D.E. Knuth, The Art of Computer Programming, Seminumerical algorithms,
vol. II, Addison-Wesley, Reading (MS), 1981.
[48] N. Koblitz, A Course in Number Theory and Cryptography, Springer-Verlag,
New York, 1987.
[49] N. Koblitz, Algebraic Aspects of Cryptography, Springer, New York, 1999.
[50] N. Koblitz, Introduction to Elliptic Curves and Modular Forms, Springer, New
York, 1984.
- Bib.3 -
[51] K. Kurosawa, W. Ogata, Efficient Rabin-type Digital Signature Scheme, Design, Codes and Cryptography, 16, 1999, 53-64.
[52] R. Lidl, H. Niederreiter, Finite Fields, Addison-Wesley, Reading (MS), 1983.
[53] H. Lysing, Secret Writing - An Introduction to Cryptograms, Ciphers and Codes,
Dover, New York, 1974.
[54] F.J. MacWilliams, N.A.J. Sloane, The Theory of Error Correcting Codes, North
Holland, Amsterdam, 1977.
[55] J.L. Massey, Shift-Register Synthesis and BCH decoding, IEEE Trans. on Inform. Th., IT-15, 1969, 122-127.
[56] A.J. Menezes, Elliptic Curve Public Key Cryptosystems, Kluwer Academic Publishers, Boston, 1993.
[57] A.J. Menezes, P.C. van Oorschoot, S.C. Vanstone, Handbook of Applied Cryptography, CRC Press, New York, 1997.
[58] M. Mignotte, Mathematics for Computer Algebra, Springer, New York, 1992.
[59] R. Mollin, An Introduction to Cryptography, CRC, NY, 2007.
[60] NBS-SHA. Secure Hash Standard (SHS). FIPS publication 180-2, Ntional Bureau of Standards, 2003.
http://carc.nist.gov/publications/fips/fips180-2/fips180-2.pdf
[61] J. Pieprzyk, T. Hardjono, J. Seberry, Fundamentals of Computer Security,
Springer, New York, 2003.
[62] M. Rabin, Digitalized signature as intractable as factorization, Technical Report MIT/LCS/TR-212 , MIT Laboratory for Computer Science, 1978.
[63] R. Rivest, A. Shamir, L. Adleman, A Method for Obtaining Digital Signatures
and Public-Key Cryptosystems, Comm. of ACM, vol. 21, Feb. 1978, 120-126.
[64] M. Rosing, Implementing Elliptic Curve Cryptography, Manning Publication
Co., Greenwich (CT), 1999.
[65] R.A. Rueppel, Analysis and Design of Stream Ciphers, Springer, New York,
1986.
[66] B. Schneier, Applied Cryptography, Wiley, New York, 1995.
[67] R. Schoof, Elliptic Curves Over Finite Fields and the Computation of the
Square Roots modp, Mathematics of Computation, vol. 44, number 170, April
1985, 483-494.
[68] C.E. Shannon, A Mathematical Theory of Communication, BSTJ, vol. 27,
1948, 379-423 and 623-656.
- Bib.4 -
[69] C.E. Shannon, Communication Theory and Secrecy Systems, BSTJ, vol. 28,
1949, 656-715.
[70] J.H. Silverman, A friendly Introduction to Number Theory, Prentice-Hall, Upper
Saddle River, 2001.
[71] J.H. Silverman, The Arithmetic of Elliptic Curves, Springer, New York, 1986.
[72] A. Sinkov, Elementary Cryptanalysis, MAA, Washington, 1966.
[73] J.H. Silverman, J. Tate, Rational Points on Elliptic Curves, Springer, New York,
1992.
[74] A. Sinkov, Elementary Cryptanalysis, MAA, Washington, 1966.
[75] Sung-Jin Cho, Un-Sook Choi, Ham-Doo Kim, Yoon-Hee Hwang,Jin-Gyoong
Kim, Seong-Hun Heo, New Synthesis of One-Dimensional 90/150 Linear
Hybrid Group Cellular Automata, IEEE Trans. on Computer-Aided Design of
Integrated Circuits and Systems, Vol. 26, N.9, Sept. 2007, 1720-1724.
[76] C. Swenson, Modern Cryptanalysis, Wiley, Indianapolis (IN), 2008.
[77] H. van Tilborg, An Introduction to Cryptology, Kluwer Academic Publ.,
Boston, 1988.
[78] F. Tricomi, Funzioni Ellittiche, Zanichelli, Bologna, 1951.
[79] A. Vardy, The intractability of computing the minimum distance of a code,
Information Theory, IEEE Transactions on, Vol. 43 , n. 6, 1997, 1757 - 1766.
DOI: 10.1109/18.641542
[80] L.C. Washington, Elliptic Curves, Number Theory and Cryptography, Chapman
& Hall, Boca Raton, 2003.
[81] S. Wicker, V.K. Bhargava, Reed-Solomon Codes and their applications, IEE Press,
New York, 1994.
[82] H. C. Williams, A modification of the RSA public key encryption procedure,
IEEE Trans. Inf. Theory, Vol. 26(6), 1980, 726-729.
[83] D. Zheng, K. Chen, W. Qiu, New Rabin-like Signature Scheme, Workshop
Proceedings of the Seventh International Conference on Distributed Multimedia Systems, Knowledge System Institute, 2001, 185-188.
[84] M. Abramowitz, I.A. Segun, Handbook of Mathematical Functions, Dover, New
York, 1968.
- Bib.5 -