# Vigenère cipher

The **Vigenère cipher** is a method of encryption invented by Giovan Batista Belaso and described in his 1553 book *La cifra del. Sig. Giovan Batista Belaso*. It obtained its name after being misattributed to Blaise de Vigenère in the 19th century. The algorithm is a keyword-based system that uses a series of different Caesar ciphers based on the letters of the keyword. It is a simplified form of polyalphabetic substitution.

It functions as follows:

1. The encipherer chooses a piece of plaintext:

VIGENERE2. The encipherer chooses a keyword and repeats it to become the length of the ciphertext:

CIPHCIPH3. To encipher letter L1 of the plaintext, the encipherer creates a new alphabet wherein A is shifted to letter L1 of the ciphertext, B is shifted to the next letter, etc.:

ABCDEFGHIJKLMNOPQRSTUVWXYZ --> CDEFGHIJKLMNOPQRSTUVWXYZAB4. The encipherer finds the letter that corresponds to L1 in the new alphabet. This is now L1 of the plaintext:

V --> X5. This is repeated for each letter in the plaintext and its corresponding letter in the key:

VIGENERE + CIPHCIPH --> XQVLPMFL

A simpler, but equivalent way to encode a message is to write out a copy of the alphabet, and then write the keyword vertically beneath the letter A. Then, starting with each letter, complete the alphabet (starting again with A after reaching Z). For example, if the keyword is "CUP", one would write:

ABCDEFGHIJKLMNOPQRSTUVWXYZ CDEFGHIJKLMNOPQRSTUVWXYZAB UVWXYZABCDEFGHIJKLMNOPQRST PQRSTUVWXYZABCDEFGHIJKLMNO

To encode a message, one locates the plaintext letter in the top row, and then reads the ciphertext letter from one of the alphabets below, using each one in turn. One can also write out the entire set of these shifted alphabets, picking the right row for any letter of the key, the resulting block of alphabets is known as a tabula recta. This use of multiple alphabets in rotation to encrypt a message is why this is called a polyalphabetic cipher; the systematic and repeated use of multiple alphabets is why this cypher is not as secure as this type can be.

The advantage of this cipher is that it makes frequency analysis more difficult. Frequency analysis is the practice of decrypting a message by counting the frequency of ciphertext letters, and comparing that with the letter frequency in normal text. For instance if P occurred a lot in a ciphertext one could suspect that P corresponded to E, because E is the most frequently used letter in English. Using the Vigenere cipher, E can be enciphered as any of several letters in the alphabet in the Vigenere cipher, defeating frequency analysis.

This cipher was not actually the strongest one invented by Vigenère. He also invented an autokey cipher which mas much stronger than this one, but the name "Vigenère cipher" became associated with this polyalphabetic cipher instead. In fact the two ciphers were often confused, and both were cometimes called "*le chiffre indéchiffrable*", or "the unbreakable cipher". For nearly 300 years this cipher was thought to be unbreakable. But Charles Babbage and Friedrich Kasiski independently broke the cipher in the middle of the 19th century. Babbage actually broke the much stronger autokey cipher, while Kasiski is generally credited with the first published solution to fixed-key polyalphabetic ciphers.

The technique used to break it is described below. The weakness relies on the fact that the key is relatively short and constantly repeated: as a result, common words like "the" will eventually be encrypted using the same key letters, leading to repeated groups in the ciphertext. If the message is long enough, therefore, the following procedure can be used:

- The analyst looks for repeated groups of letters and counts the number of spaces between the beginning of each repeated group. For instance if the ciphertext was FGXTHJAQWNFGXQ, the distance between FGX's is 10. The analyst repeats this for as many repeated groups as appear in the text.
- The analyst next factors each of these numbers. If any number is repeated in the majority of these factorings, this is probably the length of the keyword. Why? Repeated groups can appear by coincidence, but are much more likely to occur when the same letters are encrypted using the same key letters. The key letters are repeated at multiples of the key length, so the distances found in step #1 are most often multiples of the key length.
- Once the keyword length is known, the clever observation of Babbage and Kasiski comes into play. If the keyword is N letters long, then every Nth letter must be enciphered using the same letter of the keytext. Grouping every Nth letter together, the analyst has N "messages", each encrypted using a one-alphabet substitution, and each piece can then be solved using frequency analysis.
- Using the solved message, the analyst can quickly determine what the keyword was. Or, in the process of solving the pieces, the analyst might use guesses about the keyword to assist in breaking the message.
- Once the interceptor knows the keyword, he/she can use that knowledge to read future messages, if the key does not change.

Kasiski actually used "superimposition" to solve the Vigenere cipher. He started by finding the key length, as above. Then he took multiple copies of the message and laid them one-above-another, each one shifted left by the length of the key. Kasiski then observed that each *column* was made up of letters encrypted with a single alphabet. His method was equivalent to the one described above, but is perhaps easier to picture.

Modern attacks on polyalphabetic ciphers are essentially identical to that described above, with the one improvement of coincidence counting. Instead of looking for repeating groups, a modern analyst would take two copies of the message and lay one above another. (Actually, modern analysts use computers, but this description illustrates the principle.) The analyst then shifts the bottom message one letter to the left, then two letters to the left, etc., each time going through the entire message and counting the number of times the same letter appears in the top and bottom message. The number of "coincidences" goes up sharply when the bottom message is shifted by a multiple of the key length, because then the adjacent letters are in the same language using the same alphabet. Having found the key length, cryptanalysis proceeds as described above using frequency analysis.

## External Links

- Basic Cryptanalysis at H2G2
- Java Vigenere applet with source code (GNU GPL)