A Codes and Ciphers Primer
A Codes and Ciphers Primer
A Codes and Ciphers Primer
* Codes and ciphers have been around for millennia, often proving of vital strategic importance. They are still in widespread the 21st century and are an important component to the operation of the internet. This document provides a brief introductio codes and ciphers.
[1] BASIC CONCEPTS [2] SIMPLE SUBSTITUTION CIPHERS [3] SIMPLE TRANSPOSITIONS [4] FREQUENCY ANALYSIS AGAINST CIPHERS [5] CRACKING CODES / CODES VERSUS CIPHERS [6] THE VIGENERE CIPHER [7] TELEGRAPHY & CRYPTOLOGY / FRACTIONATING CIPHERS [8] ONE-TIME PAD CIPHER [9] CIPHER MACHINES [10] MODERN CIPHER SYSTEMS [11] FOOTNOTE: BIBLE CODES & THEIR KIN [12] COMMENTS, SOURCES, & REVISION HISTORY
* The oldest means of sending secret messages is to simply conceal them by one trick or another. The ancient Greek historia Herodotus wrote that when the Persian Emperor Xerxes moved to attack Greece in 480 BC, the Greeks were warned by an Greek named Demaratus who was living in exile in Persia. In those days, wooden tablets covered with wax were used for w Demaratus wrote a message on the wooden tablet itself and then covered it with wax, allowing the vital information to be smuggled out of the country.
The science of sending concealed messages is known as "steganography", Greek for "concealed writing". Steganography ha long history, leading to inventions such as invisible ink and "microdots", or highly miniaturized microfilm images that could hidden almost anywhere. Microdots are a common feature in old spy movies and TV shows. However, steganography is not secure by itself. If someone finds the hidden message, all its secrets are revealed. That led to the idea of obscuring the messa that it could not be read even if it were intercepted, and the result was "cryptography", Greek for "hidden writing".
Cryptography takes two forms: "codes" and "ciphers". The distinction between codes and ciphers is commonly misundersto "code" is essentially a secret language invented to conceal the meaning of a message. The simplest form of a code is the "jar code", in which a particular arbitrary phrase, for an arbitrary example:
The nightingale sings at dawn.
-- corresponds to a particular predefined message that may not, in fact shouldn't have, anything to do with the jargon code p The actual meaning of this might be:
The supply drop will take place at 0100 hours tomorrow.
Jargon codes have been used for a long time, most significantly in World War II, when they were used to send commands o broadcast radio to resistance fighters. However, from a cryptographic point of view they're not very interesting. A proper co would run something like this:
BOXER SEVEN SEEK TIGER5 AT RED CORAL
This uses "codewords" to report that a friendly military force codenamed BOXER SEVEN is now hunting an enemy force codenamed TIGER5 at a location codenamed RED CORAL. This particular code is weak in that the "SEEK" and "AT" wor provide information to a codebreaker on the structure of the message. In practice, military codes are often defined using "codenumbers" instead of codewords, listed in a codebook that provides a dictionary of code numbers and their equivalent words. For example, this message might be coded as:
85772 24799 10090 59980 12487
-- where "85772" maps to BOXER SEVEN, "12487" maps to "RED CORAL", and so on. Codewords and codenumbers are referred to collectively as "codegroups". The words they represent are referred to as "plaintext" or, more infrequently, "clear "plaincode", "placode", or "plaindata".
Codes are unsurprisingly defined by "codebooks", which are dictionaries of codegroups listed with their corresponding their plaintext. Codes originally had the codegroups in the same order as their plaintext. For example, in a code based on codenumbers, a word starting with "a" would have a low-value codenumber, while one starting with "z" would have a highcodenumber. This meant that the same codebook could be used to "encode" a plaintext message into a coded message or "codetext", and "decode" a codetext back into plaintext message.
However, such "one-part" codes had a certain predictability that made it easier for outsiders to figure out the pattern and "cr or "break" the message, revealing its secrets. In order to make life more difficult for codebreakers, codemakers then designe codes where there was no predictable relationship between the order of the codegroups and the order of the matching plainte This meant that two codebooks were required, one to look up plaintext to find codegroups for encoding, the other to look up
codegroups to find plaintext for decoding. This was in much the same way that a student of a foreign language, say French, an English-French and a French-English dictionary to translate back and forth between the two languages. Such "two-part" required more effort to implement and use, but they were harder to crack.
* In contrast to a code, a "cipher" conceals a plaintext message by replacing or scrambling its letters. This process is known "enciphering" and results in a "ciphertext" message. Converting a ciphertext message back to a plaintext message is known "deciphering". Coded messages are often enciphered to improve their security, a process known as "superencipherment".
There are two classes of ciphers. A "substitution cipher" changes the letters in a message to another set of letters, or "cipher alphabet", while a "transposition cipher" shuffles the letters around. In some usages, the term "cipher" always means "substi cipher", while "transpositions" are not referred to as ciphers at all. In this document, the term "cipher" will mean both substi ciphers and transposition ciphers. It is useful to refer to them together, since the two approaches are often combined in the s cipher scheme. However, transposition ciphers will be referred to in specific as "transpositions" for simplicity.
"Encryption" covers both encoding and enciphering, while "decryption" covers both decoding and deciphering. This also im the term "cryptotext" to cover both codetext and ciphertext, though the term "encicode" is sometimes seen instead. The scien creating codes and ciphers is known, as mentioned, as "cryptography", while the science of breaking them is known as "cryptanalysis". Together, the two fields make up the science of "cryptology".
Note that in this example plaintext is printed in lowercase, while ciphertext is printed in uppercase. This convention will be followed in the rest of this document.
Monoalphabetic substitution ciphers go back to at least the fourth century BC. One of the simplest monoalphabetic substitut ciphers, known as a "Ceasar shift", associated with Julius Ceasar, involves shifting letters by a number of positions, say thre
plaintext alphabet: cipher alphabet: abcdefghijklmnopqrstuvwxyz XYZABCDEFGHIJKLMNOPQRSTUVW
YBTXOBQEBFABPLCJXOZE
-- and it still remains more or less readable when translated back to plaintext:
bewaretheidesofmarch
With 26 letters in the English alphabet, there are of course 25 different possible Ceasar shift cipher alphabets. All Bob needs read the cipher is a number from 1 to 25 to define the shift. This number can be thought of as a "key" associated with the Ce shift enciphering "algorithm".
A Ceasar shift cipher is ridiculously easy to crack, since all one has to do is try all 25 Ceasar shift cipher alphabets until one works. Interestingly, however, it is still in use, in the form of the "rot13" scheme used on internet forums, which is a 13-plac Ceasar-shift used to hide the punchlines of jokes and the like.
* A more secure way to build a substitution cipher is to completely mix up the mappings between the plaintext and cipherte alphabets. There are vast number, trillions of trillions, of possible ways to scramble the alphabet, and such a scrambling mig seem on the face of it very secure.
One way to come up with a mixed cipher alphabet is for Alice to take a "keyphrase" consisting of, say, a name, such as RICHARD MILHAUS NIXON, write it down while eliminating any redundant letters, and then complete the cipher by wri down the remaining letters of the alphabet in alphabetical order:
plaintext alphabet: cipher alphabet: abcdefghijklmnopqrstuvwxyz RICHADMLUSNXOBEFGJKPQTVWYZ
This is a simple cipher algorithm, but even if a codebreaker knows that this general scheme was used, the message still cann read without the keyphrase, and a brute-force approach to cracking it is very difficult. This is a fundamental principle of cryptography, stated by a 19th-century Dutch linguist & cryptographer, Auguste Kerckhoffs von Niewenhof), and known as "Kerckhoffs' Principle": The security of a cipher should not depend on an enemy's ignorance of the enciphering algorithm, o enemy's ignorance of the key. In fact, codebreaking is often focused on discovering keys, since the cipher scheme may be w understood.
One way to transpose this message is for Alice to "write in" the words vertically in five rows without any spaces as follows:
m e t o w t e i o e n h c e w m g n e e e k r e i h i t a c t a l d t t m r l o t v n t e
A transposition of the form shown above is extremely easy to crack. Holmes just writes down the letters of the transposition rows, increasing the length of the rows until he sees something that makes sense. However, Alice could make things more d for a codebreaker a bigger headache by reading off columns in an alternating "down" and "up" fashion, or by reading off the transposition in a "spiral" pattern -- "down" on the left side, "right" across the bottom, "up" on the right side, "left" across th the second-to-left column, "down" again, and so on until all letters were transposed. Even more sophisticated transpositions "checkerboard" pattern -- such as a "knight's tour", a grid of numbers that specify the sequence of movements of a chess kni across the grid.
* Given the large number of possible monoalphabetic substitution cipher alphabets, it might seem like a substitution cipher w be very hard to break. In reality, it's very easy if given a reasonably large ciphertext message to analyze, but it took over a thousand years to figure out how.
The basic approach for cracking a monoalphabetic substitution cipher was invented by a multi-talented medieval Arabic sch named al-Kindi, and is now known as "frequency analysis". His work was an outgrowth of efforts by Arabs to perform textu analyses of religious texts to see if they actually were written by the Prophet.
Frequency analysis is a statistical method. In every language, some letters are used on the average more than others, and the percentages of letters in different languages tends to be constant. For example, in English text the three most common letters the average are "e", "t", and "a", while the three least common are "x", "q", and "z". (Average letter frequencies will be diffe different languages.) This means that if a codebreaker sees that "O", "G", and "B" are the most common letters in a cipherte they are likely to represent "e", "t", and "a".
Frequencies of characters in any text may deviate from the average, so this mapping may not be perfectly accurate. Howeve frequency analysis can also be performed on pairs of letters, or "digraphs", in the ciphertext -- "ee" is common in English te not "aa". A codebreaker can also spot common triplets, or "trigraphs", or entire words -- "the" is the most common word in English text. Text may have predictable elements -- for example, Nazi correspondence might start with "Heil Hitler!" Such predictable phrases are known as "cribs". Military correspondence tends to follow standard formats and is often loaded with cribs.
Once given these clues, a codebreaker can then use educated guesswork to "fill in the blanks". For example, an incomplete w of the form "-u-m-ri-e" in text sent from a naval base is likely to be "submarine". This is the same skill that is used to solve crossword-puzzles, and is known to cryptologists as "anagramming". The usage of the word in cryptology is somewhat diffe from the popular usage, which refers to a scrambling of the letters of one word into a different word.
If the frequency analysis of a ciphertext shows a seeming match to normal text, then the cipher is likely to be a transposition frequency analysis seems to show a different mapping of characters from normal text, but trying to plug the proper characte
back into the ciphertext gives absolutely no useful results, then the cipher is likely a combination substitution and transposit
Of course, frequency analysis requires enough text to permit the construction of a useful table of letter frequencies. This is a general truth of cryptanalysis: the more cryptotext available, the easier the cryptotext is to crack, and on the other side of the short or fragmentary cryptotexts can be difficult to crack even if they used an insecure cryptographic scheme.
* Solving a monoalphabetic substitution cipher is easy. Solving even a simple code is difficult. Decrypting a coded message little like trying to translate a document written in an alien language, with the task basically amounting to building up a "dic of the codegroups and the plaintext words they represent.
One fingerhold on a simple code is the fact, mentioned in the previous section, that some words are more common than othe such as "the" or "a" in English. In telegraphic messages, the codegroup for "STOP" (end of sentence) is usually very commo helps define the structure of the message in terms of sentences, if not their meaning.
Further progress can be made against a code by collecting many messages encrypted with the same code and then obtaining intelligence background on the messages, such as the location from where a message was sent, and where it was being sent time the message was sent; events occurring before and after the message was sent; and the normal habits of the people send the coded messages. Cribs can be helpful as well.
Various tricks can be used to "plant" or "sow" information into a code, for example by executing a raid at a particular time a location against an enemy, and then examining code messages in response to the raid. Coding errors are a particularly usefu fingerhold into a code, and naturally people are bound to make errors, sometimes disastrous ones, sooner or later. Of course planting information and exploiting errors works against ciphers as well.
* The most obvious and, in principle at least, simplest way of cracking a code is to steal the codebook through bribery, burg or raiding parties. Constructing a new code is like building a new language and writing a dictionary for it, which is a big job code is compromised, the whole task has to be done all over again, and that means a lot of work for both cryptographers and code users. In practice, when codes were in widespread use, they were usually changed on a periodic basis.
Once codes have been created, their distribution is logistically clumsy, and the probability that the code will be compromise high. In contrast, the security of ciphers is, as mentioned earlier, generally dependent on protecting the cipher keys. Cipher k can be stolen and people can betray them, but they are much easier to change and communicate.
One was the "nomenclator", which was a substitution cipher combined with a set of codewords.
Another trick was to use "nulls", or unused symbols, in ciphers. Suppose that a cipher uses the numbers 00 through 99 represent text. Even if numbers and punctuation are enciphered, that leaves a unused subset of numbers, and these unu numbers or nulls could be littered through the ciphertext. They were simply ignored when the text was deciphered.
Yet another trick was the "homophonic substitution cipher". Given a cipher based on substituting the numbers 00 thro 99 for text, there was no reason that multiple numbers couldn't be used to match a single common character, such as " help defeat frequency analysis.
One of the most significant developments was the "polyalphabetic substitution cipher", which was described in its definitive a paper published in 1586 by a French diplomat named Blaise de Vigenere. A "Vigenere cipher" uses 26 substitution cipher organized using a "Vigenere square" as shown below, with some spacing added here to make it legible:
a bcd efgh ijk lmno pqr stuv wyxz 01: 02: 03: 04: 05: 06: 07: 08: 09: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z BCD CDE DEF EFG FGH GHI HIJ IJK JKL KLM LMN MNO NOP OPQ PQR QRS RST STU TUV UVW VWX WXY XYZ YZA ZAB ABC EFGH FGHI GHIJ HIJK IJKL JKLM KLMN LMNO MNOP NOPQ OPQR PQRS QRST RSTU STUV TUVW UVWX VWXY WXYZ XYZA YZAB ZABC ABCD BCDE CDEF DEFG IJK JKL KLM LMN MNO NOP OPQ PQR QRS RST STU TUV UVW VWX WXY XYZ YZA ZAB ABC BCD CDE DEF EFG FGH GHI HIJ LMNO MNOP NOPQ OPQR PQRS QRST RSTU STUV TUVW UVWX VWXY WXYZ XYZA YZAB ZABC ABCD BCDE CDEF DEFG EFGH FGHI GHIJ HIJK IJKL JKLM KLMN PQR QRS RST STU TUV UVW VWX WXY XYZ YZA ZAB ABC BCD CDE DEF EFG FGH GHI HIJ IJK JKL KLM LMN MNO NOP OPQ STUV TUVW UVWX VWXY WXYZ XYZA YZAB ZABC ABCD BCDE CDEF DEFG EFGH FGHI GHIJ HIJK IJKL JKLM KLMN LMNO MNOP NOPQ OPQR PQRS QRST RSTU WXYZ XYZA YZAB ZABC ABCD BCDE CDEF DEFG EFGH FGHI GHIJ HIJK IJKL JKLM KLMN LMNO MNOP NOPQ OPQR PQRS QRST RSTU STUV TUVW UVWX VWXY
This defines 26 different Ceasar shift ciphers, each of which is weak in itself but which in combination result in a much mor cipher. The idea in the Vigenere cipher is to use a cipher key to select different cipher alphabets in succession as letters are enciphered. Suppose Alice wants to encipher the phrase:
use the force luke
-- with a Vigenere cipher, using the cipher keyword "WARTHOG". All she has to do is scan down the square defined above match the cipher alphabet letter to a particular row, then select the cipher character matching the plaintext letter for that row
W: A: R: T: H: O: G: W: A: R: T: H: O: G: W: row row row row row row row row row row row row row row row 23 01 18 20 08 15 07 23 01 18 20 08 15 07 23 gives gives gives gives gives gives gives gives gives gives gives gives gives gives gives u s e t h e f o r c e l u k e -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> Q S V M O S L L R T X S I Q A
This gives:
Simple frequency analysis cannot crack a Vigenere cipher, and the number of possible keys is so great that finding the right trial-and-error is effectively impossible. Despite the simplicity and elegance of the Vigenere cipher, it was mostly ignored fo next several centuries, since it was regarded as too cumbersome for general use.
* The invention of telegraphy in the mid-19th century revolutionized communications, and also had a significant impact on cryptology. Since telegrams were paid for on a word-by-word basis, there was an incentive to reduce the word count, and so "commercial codes" were developed from the outset to provide what might be called in modern terms "data compression". T also provided some security against casual reading, though they were one-part codes and not generally intended to provide security. Commercial codes did generally offer superencipherment schemes for users who wanted more secrecy.
Commercial codes remained in use until well after the First World War. Some of the entries in a commercial code published 1920s seemed almost tailored for encrypting contemporary pulp fiction:
BUKSI OBNYX PYTUO YBDIG CULKE Avoid arrest if possible. Escape at once. Collided with an iceberg. Plundered by natives. Bad as possibly can be.
The telegraph had a particular impact on military operations, contributing along with the invention of the locomotive and im firearms to a revolution in warfare. However, the telegraph was vulnerable to interception, meaning that good cryptographic schemes were required to ensure security. Codes really couldn't do the job for an army on the move in the field, because of t logistical difficulty of handling the codebooks. The Vigenere cipher came into common use, assisted by a simple invention c "cipher disk", described below. However, although the Vigenere cipher had been regarded as indecipherable for a long time crytanalysts were finally able to get a handle on it. For example, if multiple messages encrypted using a Vigenere cipher and same cipher key were intercepted, frequency analysis could be used across the first letter of all the messages in parallel, the second letter, and so on.
* Other cipher schemes were developed for military telegraphic use. During the American Civil War, Union forces used a "r cipher", basically a transposition cipher based on entire words, not characters. A small set of codewords were implemented conceal words that might provide too much of a clue to snoopers, with cipher books distributed describing various transposi patterns and the codewords to use. A general technique for cracking transpositions called "multiple anagramming" was developed in the 1870s. It requires two messages that have been transposed in the same way -- for a simple example, consider two messages consisting only of one letter word each:
TASPR PRSCO
A little examination shows that both of these two transpositions could be unscrambled into two different words:
TASPR PRSCO -> -> PARTS, TRAPS CROPS, CORPS
Given each message on its own, there's no way to figure out which of the two unscramblings would be correct for each mes However, performing the same unscrambling on both messages in parallel shows that there's only one unscrambling that yie intelligent answer for both messages:
-> ->
* A number of new cipher techniques were also developed based on the "Polybius square" or "checkerboard". This was a w converting letters into pairs of numbers, devised in classical Greek times by a scholar named Polybius. Updated into English Polybius square consists of the letters of the alphabet arranged as a 5-by-5 checkerboard grid with rows and columns numbe and "I" and "J" assigned to the same grid location:
1 1 2 3 4 5 A F L Q V 2 B G M R W 3 C H N S X 4 D I/J O T Y 5 E K P U Z
The row-and-column index numbers are then substituted for letters. For example, "help" becomes "23 15 31 35". For added security, the arrangement of the letters in the grid can be scrambled, preferably by using a cipher keyword to determine the scrambling. For example, if we use the key "RICHARD MILHAUS NIXON", we get a checkerboard that looks like this:
1 1 2 3 4 5 R D N F T 2 I/J M X G V 3 C L O K W 4 H U B P Y 5 A S E Q Z
No matter how the grid is arranged, however, as stated it's just a monoalphabetic substitution cipher, with pairs of ciphertex in place of plaintext letters. It offers little security, though it can be used as a basis for "semaphore" codes, in which a signal hold two flags in five different positions each to send the full alphabet. However, the checkerboard has much more potential plaintext letter is represented by two ciphertext digits, and that gives a new option for encrypting the message. Suppose Alice has a checkerboard based on the key "RICHARD MILHAUS NIXON" as above, and the plaintext:
down with big brother
She can obtain row-and-column indexes for the letters in this plaintext from the grid as follows:
d: o: w: n: ... row row row row = = = = 2 3 5 3 / / / / column column column column = = = = 1 3 3 1
Now she divides the plaintext into blocks of, say, five letters, and writes the indexes vertically underneath the letters:
downw ithbi gbrot her 23535 15131 43135 131 13313 21442 24131 451
To produce the final encryption, she concatenates the two rows on a block-by-block basis:
2353513313 1513121442 4313524131 131451
The message is divided into blocks to make encryption and decryption more manageable; there's no particular reason to use block size of five, any other reasonable value will work as well. This particular scheme breaks down each letter into two cip
letters, a concept known as "fractionation"; and then it breaks the message down into blocks and concatenates the two halve concept known as "seriation". The resulting message cannot be cracked by simple frequency analysis. This scheme is known "bifid" cipher, and was introduced by a Frenchman named Felix Marie Delastelle in a book published in 1901.
It is also possible to "map" plaintext letters to three or more digits that can then be transposed, and such a three-digit mappin called a "trifid" cipher. Some variations on fractionating ciphers are extremely devious and hard to crack.
* The First World War introduced wireless telegraphy into military operations, greatly extending the communications revolu begun by the telegraph. However, radio communications placed new demands on security, since it was so very easy to inter messages sent over the airwaves.
Since warships at sea could now communicate with shore stations over wireless, navies adopted code systems. Naval codeb were often bound with metal plates in the covers so they could be thrown overboard in an emergency and sink. Since World I on land quickly bogged down on land into immobile trench warfare, armies also adopted codes for frontline operations. Th was relatively little problem in distributing the codebooks, the armies not being on the move, and "trench codes" came into common use.
Sophisticated cipher systems were also developed. One of the most significant achievements in cryptography in the First Wo War was a cipher that was, and remains, uncrackable even in principle. As is typical of black magic, it had a major catch.
A Vigenere cipher based on a keyword becomes more secure with a longer keyword. It also becomes more secure if the keyword is made more unpredictable, to prevent a codebreaker from using guesswork to determine the keyword. That mean that the most secure possible Vigenere cipher is one where the keyword is as long as the message, with the keyword made u completely unpredictable random characters.
To implement such a cipher, sets of random keys could be printed on a pad of paper, with each page on the pad used once a then thrown away. As a result, this scheme is called a "one-time pad" cipher. The interesting thing about the one-time pad is is logically impossible to crack by analytic means. Imagine being given a page of completely random letters: it's impossible "crack" because there's no message there, it's just noise. Taking a message and changing all the characters at random also re in a page of noise, and it's just as uncrackable. The only way the message can be extracted is by telling the recipient of the message with a one-time pad what random changes were made to the letters in the text.
A one-time pad cipher has an interesting property: it is not so much true that no signal can be extracted from pure noise, but any signal can be extracted from noise. Since there's no fixed pattern in the ciphertext or the key, a key can be easily synthes to produce every possible message that will fit into the number of letters, such as instructions to attack the enemy, a shoppin or dirty limericks.
* The catch to the one-time pad cipher is that a specific random key can only be used once to encipher one message. If two messages are enciphered using the same key, then it is no longer impossible to crack the cipher, since a (painful) exercise in multiple anagramming could be used to decipher the texts, trying every possible key on both messages until both messages a revealed. This makes the one-time pad cipher logistically clumsy to deal with, worse than a code, requiring distribution of u pads to every user -- and so it is only used in very high-security communications.
* The "cipher disk", mentioned above, was one of the first cipher machines, invented in the 15th century. The cipher disk co of two nested disks, including an outer disk labelled with all the letters of the alphabet, and an inner disk also labelled with a
letters of the alphabet -- but not necessarily in the same order. The outer disk defines the plaintext alphabet, while the inner defines a monoalphabetic substitution cipher alphabet.
Suppose Alice takes her cipher disk and rotates the inner disk so that the letter "A" lines on the inner disk lines up with, say letter "q" on the outer disk. She can then encipher a message by taking each letter, looking up the plaintext letter on the oute and then writing down the matching ciphertext letter next to it on the inner disk. The cipher disk can be also made in the for "slide", with two alphabets written on strips of cardboard, or whatever, that can be slid next to each other. Of course, each s should have two repeating alphabets to permit them to be read conveniently when offset.
The cipher as described remains a feeble monoalphabetic substitution cipher, which is not only easy to crack, but also so eas use that the cipher disk hardly seems very handy. Where the cipher disk comes in useful is in dealing with a Vigenere ciphe
For example, suppose Alice wants to encrypt a message using a Vigenere cipher with the cipher key word "WARTHOG". T encipher the first letter, she moves the "W" on the inner disk to match up with the "a" on the outer disk, and then finds the ciphertext letter on the inner disk matching the desired plaintext letter on the outer disk. Next, Alice enciphers the second le the inner wheel by moving the "A" on the inner disc to match the "a" on the outer disk, then looking up the ciphertext letter inner disk that matches the desired plaintext letter on the outer disk. She repeats this process for the rest of the keyword, and starts all over again at the beginning of the keyword.
* The cipher disk led to a more sophisticated cipher machine, invented late in the 19th century by a French military officer n Etienne Bazeries, and known as the "Bazeries cylinder". (The basic idea was actually invented by American President Thom Jefferson a century earlier, but then forgotten.) A Bazeries cylinder consists of a set of roughly 20 to 30 numbered disks, wi different cipher alphabet on the edge of each disk, and a hole in the center of the disks to allow them to be stacked on an axl The disks are removeable and can be mounted on the axle in any order desired. The order of the disks can be considered the cipher key for the Bazeries cylinder, with both Alice and Bob arranging the disks in the same predefined order.
To encrypt a message, Alice rotates the disks to produce the plaintext message along one "row" of the stack of disks, and th selects another row as the ciphertext. To decrypt the message, Bob rotates the disks on his cylinder to produce the ciphertex along a row. It is handy if both Alice and Bob know the offset of the row, but not really necessary since Bob can simply loo around the cylinder to find a row that makes sense. It was also possible to implement the same scheme with slides containin cipher alphabets on a frame. Both schemes were used in World War I and well into World War II; they provided strong encryption, but they were hardly indecipherable.
* After World War I, a number of much more sophisticated cipher machines were built, the most famous being the "Enigma patented in 1918 by a German engineer named Arthur Scherbius.
As it emerged, the Enigma was a wooden box with a keyboard and a bank of lettered lights corresponding to the keys. To encrypt a message, a plaintext character was typed in, and after scrambling the appropriate light was turned on to give the ciphertext character. The ciphertext was then sent using Morse code over radiotelegraph. The operation of the machine was "reciprocal", in that if the "A" key was pressed and lit up the "Z" lamp, then pressing the "Z" key would light up the "A" lam meant that at the receiving end, as long as the operator had an Enigma machine set up in same way, he could just type in the ciphertext character to get the plaintext character: the procedure for encryption and decryption was the same.
The scramblings between the input and output were performed by what was called a "rotor" system. A rotor was a wheel wi electrical contacts on each side and scrambled wiring linking the contacts on the two sides. The scrambled wiring represente mixed substitution cipher alphabet. There were three rotors in series fed into one another, with this assembly referred to as a "basket". Every time a key was pressed, the first rotor would turn one position. Every time the first rotor went full circle (after 26 keypresses), the second rotor would turn one position. Every time the second rotor went full circle (after 26 * 26 = 676 keypresses), the third rotor would turn one position. When the third rotor went full circle (after 26 * 26 * 26 = 17,576 keypresses), the sequence would start all over again.
The third rotor was wired into a disk called the "reflector" that simply rerouted the connections on the output of the third dis into themselves. In operation, the electrical signal from the input key ran up through the basket of rotors to the reflector, and back down through the basket to the output lamp. This is how the Enigma achieved reciprocal operation, allowing both encr and decryption using the same configuration.
Each of the rotors had a different wiring scheme. The three rotors were removeable and could be rearranged in six different It was also possible in principle to have a stockpile of more than three rotors and select three from the set for use, further multiplying the possibilities. The Enigma also had a plugboard that allowed patch cords to be swapped around to mix six dif sets of letters. Setup of the Enigma involved selecting the order of the three rotors, selecting their starting positions, and sele the patch cord arrangement. This setup amounted to the "key" or "daykey" for the Enigma cipher, and there were a vast num possible daykeys.
* The German military standardized on the Enigma for encryption before World War II. Despite the complexity of the Enigm cipher, Polish cryptanalysts were able to figure out patterns in the operation of the Enigma that allowed them to determine th daykey, using a set of Enigma simulators known for some obscure reason as a "bombe".
However, in 1938 the Germans improved their Enigma security by providing five rotors, increasing the number of possible configurations of three rotors selected from the set, and increased the number of pairs of letters swapped by the plugboard to That greatly multiplied the possible numbers of daykey configurations, demanding a much more complicated and expensive to figure out the settings. By this time, Nazi dictator Adolf Hitler was making threatening noises against Poland, and so the got in touch with British intelligence to tell them how Enigma had been cracked.
The Nazis invaded Poland in September 1939, setting off World War II. They quickly overran the country. The British were hard-pressed by defeats early in the war, and to help fight back set up a sophisticated codebreaking operation at an estate no of London named Bletchley Park. Although the Germans had tightened up their procedures for Enigma operation, eliminatin loopholes that the Poles had exploited to ferret out daykeys, Bletchley Park cryptanalysts -- most significantly a brilliant Ox mathematician named Alan Turing -- were able to develop a new, more sophisticated bombe system to once again crack Eni
* A range of cryptographic schemes were used in World War II, including traditional military codes, much like the naval co and trench codes used in World War I, as well as the Bazeries cylinder and slide schemes. The war also saw considerable us "telecipher" systems for high-level secure communications.
A telecipher system was an encrypted teletypewriter. Teletypewriters used a sequence of on-off electrical signals to transmit receive letters, using a scheme known as the "Baudot code". If the "on" signal is represented as a "1" and the "off" signal represented as a "0", then the Baudot code had the form of a set of "binary" numbers as follows:
A B C
? :
This scheme uses five "binary digits" or "bits" per letter; there were "shift" characters to permit use of a secondary set of nu and punctuation characters.
A telecipher machine scrambled the "stream" of Baudot plaindata using what was known in the old days as "modulo-two arithmetic" and is now known as an "exclusive OR" or "XOR" operation. The basic rules of the XOR operation are as follow
0 0 1 1 XOR XOR XOR XOR 0 1 0 1 = = = = 0 1 1 0
In simpler terms, if two bits are XORed that have the same value, the result is a "0". If two bits are XORed that have a diffe value, the result is a "1". XOR can be performed with multibit values on a bit-by-bit basis -- for example, two five-bit value and Y can be XORed to give a 5-bit value Z:
X: Y: 10011 00110 _____ 10101
Z:
The XOR operation has the interesting property of being "reversible", in that if two binary values X and Y are XORed toget then the result Z can be XORed with Y to give X again:
Z: Y: 10101 00110 _____ 10011
X:
A telecipher machine used an arrangement of electromagnetic relays to XOR the plaindata with a stream of seemingly rando as a "mask":
data: mask: 10100 00001 00110 10010 00100 10100 00110 10000 10010 00001 01010 01001 00101 11011 10110 00111 11001 10101 01010 10000 11001 10001 _________________________________________________________________ 11101 00100 11101 00100 00011 01101 10011 11010 00010 11000 11011
cipher:
The enciphered binary stream could be deciphered by simply XORing it again with the same mask.
The mask was generated by an arrangement of rotating "wheels", with sets of pins on them that could be set in or out. The w could generate very long streams of bits in an unpredictable fashion, and telecipher machine messages could be very hard to crack. In fact, the British developed one of the first electronic computers, named "Colossus" and built with vacuum tubes, to German teleciphers messages.
* In the postwar period, electronic computers became much more widespread, and modern cryptography is effectively based computer-based ciphers. Two classes of "digital" ciphers were developed: "stream" ciphers and "block" ciphers.
The old telecipher systems essentially implemented stream ciphers. A stream cipher operates by taking a stream of plaindata and XORing them on a bit-by-bit basis with a mask of more or less random bits. The cipher stream can then be deciphered b XORing it again with the same mask. The computer generates the mask using some variation of "pseudo-random" bitstream
generation algorithm. Block ciphers, in contrast, break up the stream of data bits and perform a complicated set of shufflings XOR operations on each block. In either case, a "key" consisting of a number of bits is used to specify encryption and decry
In the 1970s, the US National Bureau of Standards (NBS, now the National Institute of Standards & Technology / NIST) created a specification for a "Data Encryption Standard (DES)", using a 56-bit block cipher scheme. DES became a standard most of the rest of the century. Modern digital ciphers will use a key at least 100 bits long.
By the 1970s, use of computing and computer networking was beginning to take off, and some cryptologists were looking forward to the future. One of the issues that was considered was the "key exchange" problem. All ciphers up to that time we "symmetric", meaning the same key was used to encrypt and decrypt a message. However, symmetric ciphers posed some problems in the emerging world of computer networking. In modern terms, suppose customers are visiting a business websi want to establish secure communications, say to make a credit-card payment. In that case, the business would have to excha different key with each customer to guarantee security, which would be very cumbersome.
In 1975 a cryptologist named Whitfield Diffie at Stanford University in California came up with an idea that seems almost o in hindsight but must have sounded preposterous at the time: "public key" cryptography. The idea was that a public key ciph would have two keys: a "public" key, available to anyone, that can be used to encipher a message; and a "private" key, know only to one person, that can be used to decipher a message. The bizarre thing about a public key cipher is that the public key be used to encipher a message, but once that message is enciphered, only the private key can decipher the message.
In public key cryptography, if Bob wants to send an encrypted message to Alice, he obtains her public key. Alice could put public key on her website or business card if she liked for anyone to use. Bob uses Alice's public key to encipher a message then sends it off. He doesn't need to worry about anyone reading the message, since the public key can't be used to decrypt i Only Alice can decrypt it, using her private key.
Diffie had no idea of how to do this, but in 1975 he and his colleagues published the concept anyway to inspire other cryptologists. About two years later, three researchers at the Massachusetts Institute of Technology (MIT) computer science department named Ron Rivest, Adi Shamir, and Ron Adleman announced that they had developed a public key cipher, nam "RSA" after their initials. RSA, and other public-key ciphers that followed, had limitations in that they required a lot of com power and were relatively easy to crack. As a result, they weren't used to encipher long messages, instead being used to enc a key for a block cipher that was used to send the actual message.
* One of the interesting things that could be done with public-key cryptography was "message authentification" -- that is, pr a guarantee that a message was actually sent by the person who was supposed to have sent it. Normally, in a public-key ciph message is encrypted with the public key and decrypted with the private key. It also works the other way around: a message be encrypted with the private key and decrypted with the public key. Such an "inside-out" encryption, to give it a name, sou absurd, since anyone could decrypt the message, but the point is that only the person with the private key could have encryp
To validate a message, a "hash" can be made of the message. A hash is a relatively short string of bits that is created from th message, with the interesting property that the slightest change in the message will result in a very different hash. Alice can encrypted message to Bob using normal cryptographic methods, and then can send a hash to Bob using inside-out encryptio Bob extracts the hash with Alice's public key and then performs a hash on the decrypted message to see if it matches the has sent by Alice. If Alice didn't really send the message, Bob won't be able to decipher it; if the message has been tampered wi Bob's hash won't match.
There's another aspect of message authentification. Suppose Bob hasn't heard from Alice in years, has lost track of her, and starts getting messages from her again. How does he know the messages are really from Alice and not somebody, say Zelda pretending to be Alice? Zelda can hand him her public key and Bob would be no wiser.
If Bob wants to make sure Alice really is Alice, he can obtain a "digital certificate" or "cert" from a "certification server". Th secure network servers operating on a "trusted third party" basis that store the public keys associated with specific users, and provide them on demand. Bob can get the cert associated with Alice from a certification server and then check to see if the p key he has matches that in Alice's cert.
* Modern digital encryption schemes are commonly used on the internet. For one particularly important example, most peop who surf the internet to make online purchases have used cryptography to provide their credit-card numbers to vendors, usin "Secure Sockets Layer (SSL)" protocol. Pages protected by SSL are designated with an "https" prefix instead of the convent "http" prefix.
SSL works more or less transparently to the user, and uses both public-key and symmetric ciphers. Supposed Alice wants to make a purchase from an online vendor and needs to give the vendor her credit-card number. Her web browser will downloa the vendor's public key, validate it against the vendor's cert -- a web browser usually maintains a list of certs for prominent o businesses -- and use it to encrypt a secret symmetrical key produced automatically at (more or less) random by the Web browser. The Web browser will pass the encrypted secret key to the vendor, and this secret key will be used to encrypt the session. Actually, several secret keys are exchanged, but a detailed discussion of the complexities is a bit beyond the scope document. BACK_TO_TOP
* One of the more dubious aspects of cryptology is the hunt for hidden ciphers in great works of literature, particularly the B For want of a better term it might be called "pseudocryptology". The usual approach of pseudocryptologists is to scan throu text of interest and take out letters at intervals, for example every 13th letter, every 100th letter, or every 1,617th letter. They arrange the letters in a block and see if patterns pop out. The patterns are typically identified as scrambled words buried in a stream of gibberish. It is not too surprising that one can actually extract seemingly sensible remarks using this scheme. It's a a logical level with, say, playing the music of the Beatles backwards and listening for secret messages left by John Lennon.
A modern advocate of such hidden codes, Michael Drosnin, has published a series of BIBLE CODE books that made predictions of a wide number of events hidden in the Bible. The books made the best-seller lists, despite the fact that most o predictions identified by Drosnin in his books were, conveniently, of events that had already happened. His predictions of ac future events, such as the end of the world in the year 2000, were somewhat inaccurate.
Drosnin incautiously challenged critics to find similar predictions about, say, the assassination of a prime minister in MOBY and so an Australian mathematician named Brendan McKay went out and did precisely that, identifying "predictions" of nin assassinations. Similar exercises were performed on WAR AND PEACE and other classics. However, pseudocryptology ha been around for a long time, and though it is easily debunked, it still refuses to die off. BACK_TO_TOP
* In 2001 I released the first edition of the multichapter online document CODES, CIPHERS, & CODEBREAKING, wh has evolved into a fairly thorough and extensive discussion of the subject. It's proven reasonably popular and I'm basically p with it, but its detail created an unavoidable problem: it was too much for a lot of people to want to read.
As a result, I decided to render it down into a one-chapter primer document for those who were curious but didn't want to do lot of work. Those who have read this primer and are interested in further investigation can go on to the parent document. S all this document amounts to is a grossly abridged version of the parent document, no sources are cited here. * Revision history:
v1.0.0 / 01 may 07 / gvg