555 Spring12 Topic02
555 Spring12 Topic02
555 Spring12 Topic02
CS 555
CS555 Topic 2 1
Lecture Outline
• Basics of probability
• Vigenere cipher.
• Attacks on Vigenere: Kasisky
Test and Index of Coincidence
• Cipher machines: The Enigma
machine.
• Required readings:
– Katz & Lindell: 1.1 to 1.3
• Recommended readings
– The Code Book by Simon Singh
CS555 Topic 2 2
Begin Math
CS555 Topic 2 3
Random Variable
Definition
A discrete random variable, X, consists of a finite set
X, and a probability distribution defined on X. The
probability that the random variable X takes on the
value x is denoted Pr[X =x]; sometimes, we will
abbreviate this to Pr[x] if the random variable X is fixed.
It must be that
0 Pr[ x] for all x X
Pr[ x] 1
xX
CS555 Topic 2 4
Example of Random Variables
• Let random variable D1 denote the outcome of throw one
dice (with numbers 0 to 5 on the 6 sides) randomly, then
D={0,1,2,3,4,5} and Pr[D1=i] = 1/6 for 0 i 5
• Let random variable D2 denote the outcome of throw a
second such dice randomly
• Let random variable S1 denote the sum of the two dices,
then S ={0,1,2,…,10}, and Pr[S1=0]
= Pr[S1=10] = 1/36 Pr[S1=1] =
Pr[S1=9] = 2/36 = 1/18 …
• Let random variable S2 denote the sum of the two dices
modulo 6, what is the distribution of S2
CS555 Topic 2 5
Relationships between Two Random
Variables
Definitions
Assume X and Y are two random variables,
then we define:
- joint probability: Pr[x, y] is the probability that
X takes value x and Y takes value y.
- conditional probability: Pr[x|y] is the probability
that X takes on the value x given that Y takes
value y.
Pr[x|y] = Pr[x, y] / Pr[y]
- independent random variables: X and Y
are said to be independent if Pr[x,y] = Pr[x]P[y],
for all x X and all y Y.
CS555 Topic 2 6
Examples
CS555 Topic 2 8
Bayes’ Theorem
Bayes’ Theorem
If P[y] > 0 then
P[ x ]P[ y | x ]
P[ x | y ]
P[ y ]
Corollary
X and Y are independent random variables
iff P[x|y] = P[x], for all x X and all y Y.
CS555 Topic 2 9
End Math
CS555 Topic 2 10
Ways to Enhance the Substitution
Cipher against Frequency Analysis
• Using nulls
– e.g., using numbers from 1 to 99 as the ciphertext
alphabet, some numbers representing nothing and are
inserted randomly
• Deliberately misspell words
– e.g., “Thys haz thi ifekkt off diztaughting thi ballans off
frikwenseas”
• Homophonic substitution cipher
– each letter is replaced by a variety of substitutes
• These make frequency analysis more difficult,
but not impossible
CS555 Topic 2 11
Towards the Polyalphabetic
Substitution Ciphers
• Main weaknesses of monoalphabetic substitution
ciphers
– In ciphertext, different letters have different frequency
• each letter in the ciphertext corresponds to only one letter in the
plaintext letter
• Idea for a stronger cipher (1460’s by Alberti)
– Use more than one cipher alphabet, and switch between them
when encrypting different letters
• As result, frequencies of letters in ciphertext are similar
• Developed into a practical cipher by Vigenère
(published in 1586)
CS555 Topic 2 12
The Vigenère Cipher
Treat letters as numbers: [A=0, B=1, C=2, …, Z=25]
Number Theory Notation: Zn= {0, 1, …, n-1}
Definition:
Given m, a positive integer, P = C = (Z26)n, and K = (k1, k2,
… , km) a key, we define:
Encryption:
ek(p1, p2… pm) = (p1+k1, p2+k2…pm+km) (mod 26)
Decryption:
dk(c1, c2… cm) = (c1-k1, c2-k2 … cm- km) (mod 26)
Example:
Plaintext: C R Y P T O G R A P H Y
Key: LUCKLUC KLUCK
Ciphertext: N L A Z E I I B L J J I
CS555 Topic 2 13
Security of Vigenere Cipher
• Vigenere masks the frequency with which a
character appears in a language: one letter in
the ciphertext corresponds to multiple letters
in the plaintext. Makes the use of frequency
analysis more difficult.
• Any message encrypted
by a Vigenere cipher is a
collection of as many shift ciphers as there
are letters in the key.
CS555 Topic 2 14
Vigenere Cipher: Cryptanalysis
• Find the length of the key.
• Divide the message into
that many simple
substitution encryptions.
• Solve the resulting simple
substitutions.
– how?
CS555 Topic 2 15
How to Find the Key Length?
• For Vigenere, as the length of the keyword
increases, the letter frequency shows less
English-like characteristics and becomes
more random.
• Two methods to find the key
length:
– Kasisky test
– Index of coincidence
(Friedman)
CS555 Topic 2 16
Kasisky Test
• Note: two identical segments of plaintext, will
be encrypted to the same ciphertext, if the
they occur in the text at the distance , (0
(mod m), m is the key length).
• Algorithm:
– Search for pairs of identical
segments of length at least 3
– Record distances between
the two segments: 1, 2, …
– m divides gcd(1, 2, …)
CS555 Topic 2 17
Example of the Kasisky Test
Key K I N G K I N G K I N G K I N G K I N G K I N
G
PT t h e s u n a n d t h e m a n i n t h e m o o
n
CT D P R Y E V N T N B U K W I A O X B U K W W B
T
CS555 Topic 2 18
Index of Coincidence (Friedman)
Informally: Measures the probability that two random
elements of the n-letters string x are identical.
Definition:
Suppose x = x1x2…xn is a string of n alphabetic characters.
Then Ic(x), the index of coincidence is:
I ( x) P( x x )
when i and j are uniformly
c randomly
i chosen
j from [1..n]
CS555 Topic 2 19
Index of Coincidence (cont.)
S
fi S S
i 0 2
f i ( f i 1) fi
2
S
pi
2
I C ( x) i 0
i 0
n n(n 1) n2 i 0
2
CS555 Topic 2 21
Index of Coincidence of English
• For English, S = 25 and pi can be estimated
i 25
I c ( x) pi 0.065
2
i 0
CS555 Topic 2 22
Finding the Key Length
y = y1y2…yn, , assum m is the key length, write y vertically in an
m-row array
y1 y m 1 ... y n m 1 y1
y y m 2 ... y nm 2
2 y2
... ... ... ... …
ym y2m ... yn ym
CS555 Topic 2 23
Finding out the Key Length
• If m is the key length, then the text ``looks like’’
English text
i 25
I c ( y i ) pi 0.065 1 i m
2
i 0
i 25
1 2 1 1
Ic ( ) 26 2 0.038
i 0
26 26 26
CS555 Topic 2 24
Rotor Machines
• Basic idea: if the key in Vigenere cipher is very long,
then the attacks won’t work
• Implementation idea: multiple rounds of
substitutions
• A machine consists of multiple cylinders
– Each character is encrypted by multiple cylinders
– Each cylinder has 26 states, at each state it is a
substitution cipher
– Each cylinder rotates to change states according to
different schedule
CS555 Topic 2 25
Rotor Machines
• A m-cylinder rotor machine has
– 26m different substitution ciphers
• 263 = 17576
• 264 = 456,976
• 255 = 11,881,376
CS555 Topic 2 26
Earliest Enigma Machine
• Use 3 scramblers (motors): 17576
substitutions
CS555 Topic 2 27
History of the Enigma Machine
• Patented by Scherius in 1918
• Widely used by the Germans from 1926 to the
end of second world war
• First successfully broken by the Polish’s in the
thirties by exploiting the repeating of the
message key
• Then broken by the UK intelligence during the
WW II
CS555 Topic 2 29
Coming Attractions …
• Information-Theoretic secrecy (Perfect secrecy),
One-Time Pad
CS555 Topic 2 30