What is the MD5 hash function (md5 message-digest)?

MD5 is a widely used hash function that produces a message digest (or hash value) of 128 bits in length. It was initially designed as a cryptographic hash function but, at a later stage vulnerabilities were found and therefore is not considered suitable for cryptographic applications.

It was created in 1991 by Ronal Rivest and was published in 1992 in RFC1321.

Introduction
MD5 algorithm
- Processing each 512-bits block
Is MD5 Secure?
Python code to generate MD5 message digest
MD5 message digest from the command line

Introduction

We can use two main ways of creating Hash functions:

Based on compression. The input is transformed into an output of a smaller size
- MD5, SHA-1, and SHA-2
Based on permutations. The output has the same size as the input.
- SHA-3 (Keccak)

MD5 is based on the Merkle-Damgård construction. Find below an illustration of this construction.

Notice that the original message is processed in blocks. The blocks are concatenated using an initial vector (IV) and a function f. The value of the initial vector will change with each concatenation and at the end, we obtain the hash value.

In the case of the MD5 message digest, the output will be 128 bits.

MD5 algorithm

MD5 was created by Ronald Rives in 1991. It is an improvement of MD2 and MD4.

This function will expand the message to 64 bits less than a multiple of 512. The expansion happens by adding a 1 and as many 0s as needed. The last 64 bits will have information on the size of the message to avoid a length extension attack.

Notice that hash functions do not use secrets (except the keyed hash functions). All information that is used to create a hash is in the public domain. The MD5 algorithm is described in RFC1321.

Find below a figure that shows the general scheme of MD5.

How does it work?

The input of the algorithm is a message of K bits.

The first thing we do is to expand this message, to a length of 512 bits multiple minus 64 bits. To do that, we add 1 to the message, followed by 0s (as many as we need to extend the message to a multiple of 512 minus 64 bits). Then, we add the last 64 bits using the size of the original message (K mod 2⁶⁴).

The next step is to divide the message into 512 bits blocks.

Now the blocks are processed iteratively. We use the initial vector abcd (as in the Merkle-Damgård structure). We apply the H_MD5 (described below) to the first block and we obtain a modified initial vector (a’b’c’d’). The new vector will be the initial vector to process the second 512 bits block (Y2). We repeat this procedure until all the 512-bit blocks are processed.

In the end, we will have a digest (hash value) of 128 bits.

Let’s expand on what happens when one block is processed (H_MD5step).

Processing each 512-bits block

In the figure below you can see what happens when the first block is processed.

Let’s examine the algorithm.

The first 512-bits block Y1 is processed as follows:

We divide the block in sixteen 32-bits words w[0]-w[15] (16×32=512)
We assign a,b,c, and d the initial values as specified in the general scheme. This will be the initial vector according to the Merkle-Damgård structure.
Using the functions F and FF (see the figure above), we perform 16 rounds, one for each 32-bits word. This will result in modified values of a,b,c, and d in each round.
Using the functions G and GG, we perform 16 rounds using as the initial vector the output of the previous 16 rounds. This will result in modified values of a,b,c, and d in each round.
Using the functions H and HH, we perform 16 rounds using as an initial vector the output of the previous 16 rounds. This will result in modified values of a,b,c, and d in each round.
Using functions I and II, we perform 16 rounds using as the initial vector the output of the previous 16 rounds. This will result in modified values of a,b,c, and d in each round.
In the end, we will get the values a’,b’,c’,d’ that will be used as input to process the second 512-bits block Y2. Notice that to get this result, a,b,c, and d will be modified 16×4=64 times.
Then the process is repeated for all the 512-bits blocks.

Is MD5 Secure?

MD5 is not suitable for cryptographic applications.

As stated in RFC 6151 “MD5 is no longer acceptable where collision resistance is required such as digital signatures.”

Python code to generate MD5 message digest

import hashlib

message_digest = hashlib.md5(b'This is my message')
print(message_digest.hexdigest())

message_digest = hashlib.md5(b'This is my message.')
print(message_digest.hexdigest())

After executing the code above, you will get the following result:

rafel@Rafaels-iMac crytography % python3 md5.py
91c395cbd350da6bedfe3b24db9517b0
fd67973c602afddaa018e72836ff9370
rafel@Rafaels-iMac crytography %

Notice how the message digest changes by just adding ‘.’ at the end of the message.

MD5 message digest from the command line

To obtain a md5 message digest using the command line we can use the application md5 in Mac OS, or md5sum in Linux-based distributions.

Find below an example of how to obtain the same message digest as in the Python example above.

rafel@Rafaels-iMac ~ % echo -n This is my message | md5
91c395cbd350da6bedfe3b24db9517b0
rafel@Rafaels-iMac ~ %

Related posts: