The Caesar cipher is an encryption technique used by Julius Caesar to send communications that had military importance for his empire. Today is a technique that is studied in Cryptography under the classification of substitution ciphers, a topic that at the same time is under classic cryptography. It is also known as a shift cipher, Caesar code or Caesar shift.
In this post, you will learn everything you need to know about the Caesar cipher, including how to break it.
Table of Contents
- General description
- Caesar cipher formula
- Advantages and disadvantages
- Python implementation
- Try it online
- Brute force attack in python
The technique is very simple.
First, you need an alphabet, let’s say a, b, c, …, z. Then you need a key (the shift) and a message (a.k.a. plain text) to encrypt. In this case, a key is a number from 1 to the number of symbols you have in the alphabet. For instance, if you use the English alphabet, you can use as a key a number between 1 and 25, including both.
Another specific case to consider is when the shift goes beyond z, you start again with a. Let’s say you are encrypting y with the key 3, then the encrypted letter will be b (y, z, a, b)
The process is quite simple, let’s see the following example.
Plain text: hi
- Substitute the letter h for the next letter (shift 1), in this case, g.
- Substitute the letter i for the next letter (shift 1), in this case, j.
So, the encrypted text is gj.
To decrypt the text, we use the key -1.
- Substitute the letter g for the previous letter (shift -1), in this case, h.
- Substitute the letter j for the previous letter (shift -1), in this case, i.
So, the decrypted text (plain text) is “hi”.
Caesar cipher formula
From the description above, you can see that a Caesar cipher is easy to implement. You can just use several conditions as in the description.
However, there is an even easier way to do it. Using the Caesar cipher formula.
Let’s assign a number to each alphabet letter a=0, b=2, …, z=25
Then the formula is as follows:
c = (letter + key) mod 26, where c is the encrypted character. Mod is the remainder of the division.
Using the remainder of the division will guarantee us that if the sum of letter + key is greater than 26, then the encrypted character will always be part of the alphabet (greater or equal to 0 and less than 26).
As an example, if we get 26 as a result (notice that z is the last character and equals 25), we should get the letter a. So, 25 mod 26 = 0 and 0 represents a. Similarly, if we get 27, 27 mod 26 = 1, then we get the character b, and so on.
Advantages and disadvantages
The main advantage of the Caesar cipher is its simplicity and the speed of the technique.
You can encrypt and decrypt a text within seconds.
This happens because the sum and division operations using small numbers are fast and easy to calculate.
The main disadvantage is that the number of possible keys is limited, only 25 possible keys. For this reason, it is very easy to break this cipher with the use of a computer. Even manually, you just have to try 25 different keys.
To implement the Caesar cipher in python, we are going to use the function ord. This function returns the Unicode code for a single character.
For instance, ord(‘a’)=97, ord(‘b’) = 98, ord(‘c’)=99, …, ord(‘z’)=122
The opposite function in python is chr. For instance, chr(97)=’a’, chr(98)=’b’ and so on.
So, let’s see the python code.
def caesar(key, message): message = message.lower() message = message.replace(' ','') cypher_text = "" for letter in message: cypher_text = cypher_text + chr((ord(letter) + key - 97) % 26 + 97) return cypher_text if __name__=='__main__': print('encrypted message') encoded = caesar(3,'secret message') print(encoded) print('decrypted message') print(caesar(-3,encoded))
In this example, we are going to remove spaces from the text to encrypt, and we are also assuming that the text-only has English alphabet symbols.
First, we convert the message to lower case characters (find out the value of ord(‘A’) is not equal to ord(‘a’)). Then we remove all white spaces.
The formula shows that we are subtracting 97. The reason behind that is that the range of ord(‘a’) to ord(‘z’) is 97 to 122 (see the examples at the beginning of this section). So, if you subtract 97, the range will be 0 to 25, which is what we need. Then, after we calculate the remainder of the division, we add 97 so we get the respective letter.
Let’s execute the python code from the previous section and print the result.
Now, let’s try with another message: “This message is encrypted with Caesar cipher”
Try it online
Brute force attack in python
To do a brute force attack we should try every possible key. In this case, it is easy, because they are only 25 possible keys. For a computer, it can take less than a second to try all the keys depending on the length of the encrypted text.
There are three requirements needed for a brute force attack to be successful:
- We know the encryption and decryption algorithms that were used
- The number of possible keys is small
- The plain text language is known, and we can easily recognize it.
Let’s find out if it is possible to be successful in our case:
- We know the technique in use is the Caesar cipher
- Only 25 possible keys
- Plain text is the English text
So, let’s try a brute force attack on the encrypted message: euxwhirufhdwwdfn
To do this, I will implement a python program that will test all the 25 keys.
def caesar(key, message): message = message.lower() message = message.replace(' ','') cypher_text = "" for letter in message: cypher_text = cypher_text + chr((ord(letter) + key - 97) % 26 + 97) return cypher_text if __name__=='__main__': #brute force attack for i in range(1,26): print(i, '-->' , caesar(-i,'euxwhirufhdwwdfn'))
When you run the code above, you will get the following result.
From the picture above you can see that with key=3, the plain text is “bruteforceattak” (remember we remove the spaces before the encryption), which is the plain text.
Remember I mentioned before that the language of the plain text should be known? The reason is that the plain text that was encrypted can be in several formats/languages. For instance, if the plain text was in binary or hexadecimal system, you won’t be able to identify the plain text like in the previous example.
In such cases, you should know what is the language used, so you can translate it then to something that you can easily understand just by looking at it.
Caesar cipher is one of the most known substitution ciphers. It can be easily broken with the use of a computer.
For the brute force attack to be successful we should know 3 things:
- The text was encrypted with the Caesar cipher.
- We know the language of the plain text and can recognize it easily.
- The number of keys is small.