Home » Blog » What is the Merkle-Damgård construction?

What is the Merkle-Damgård construction?


The Merkle-Damgård construction is a scheme to design collision-resistant cryptographic hash functions. The scheme uses one-way compression functions that are collision-resistant. The original message is extended so it has a size of a specific multiple (i.e. 512 or 1024). The message is then divided into blocks (512 bits or 1024 bits). Then, we apply the compression function to each block and an input vector iteratively. Lastly, we apply a finalization function to the modified input vector and the output is the hash value (also called a message digest).

The scheme is based on the proof that if f is collision-resistant, then H is also collision-resistant.

So, if we choose a one-way function f that is collision-resistant, by following the Merkle-Damgård structure we can obtain a cryptographic hash function.

Description of the Merkle-Damgård construction

Find below an illustration of this construction.

Merkle-Damgård construction
Merkle-Damgård construction

The first step is to expand the message to a length that is a multiple of a specific number of bits. The reason for this step is that compression functions cannot take arbitrary size inputs.

To expand the original message we use two blocks. The first one is the padding block and, the second one represent the size of the message.

We create the padding block as a ‘1’ followed by ‘0’s, as you can see in the figure above under “Expansion”.

The reason for the padding block to start with a 1 is to avoid collisions. Suppose you have a message M and you add a padding block of 0s. Let’s create a message M’ that is M plus a 0 at the end. Then the hash of M will be equal to the hash of M’. You won’t be able to know if the 0 after M is the padding block added to M, or if it is the end of M’.

The size of the padding block plus the size of the message must be a multiple of a fixed number (i.e. 512) minus 64.  This is because we must use the last 64 bits (K block from the figure) to specify the size of the original message. We use the size of the message to create the hash to avoid the length extension attack.

We process the expanded message in blocks by concatenating them using an initial vector (IV) and a one-way collision-resistant function f. The value of the initial vector will change with each concatenation and at the end, we obtain the hash value.

The one-way collision-resistant function will be based on logical operations AND, OR, XOR, etc.

The size of the IV will determine the size of the hash value. For instance, in the MD5 algorithm, the IV size is 32×4=128 bits (4 words ABCD of 32 bits each one), and the size of the MD5 message digest is 128. In the case of SHA-1, the initial vector size is 32×5=160 bits (5 words ABCDE of 32 bits each one) and the size of the hash value (or message digest) is 160 bits.

Related posts: