MD5 length extension attack in Python

The length extension attack exploits the fact that the hash value created with the Merkle-Damgård construction represents the internal state of the Initial Vector at the end of the execution. This means that you can use the hash value to set up the internal state (the modified values of the Initial Vector) and keep processing more blocks of data just like the algorithm didn’t finish.

See below the steps needed to carry out this attack.

Steps to do a length extension attack
Why creating a new message and a valid hash can be a problem?
A graphical representation of the length extension attack
MD5 length extension attack in Python
How to prevent the length extension attack?

Steps to do a length extension attack

For the attack to be successful, we need to have the hash of a message and the length of the password that was used. The length of the password can also be guessed by trying different lengths.

Steps to carry on a length extension attack:

Initialize the internal state of the hash function using the hash we got from the sender and the length of the message that was already processed. For this second part, we need the length of the password.
Process the extra message that we want to add (a.k.a. the attack). Following the Merkle-Damgård construction, we divide the new message into blocks and apply a function f that depends on the algorithm we are using (MD5, SHA-1) until we obtain the new hash.
Create the new message as the original message plus the padding used as defined in the Merkle-Damgård construction, then we add the new message (the attack).
Send the new message and the new hash calculated to the receiver (the victim of the attack).

If you want to see a Python code that you can execute yourself, keep reading and you will find it by the end of this post.

Why creating a new message and a valid hash can be a problem?

When the hash function is used only to provide Integrity, this attack is not a problem. The way this usually works is someone has a file and the hash of that file. When you download that file, you calculate the hash value of the downloaded file and compare it with the original hash value. If they are different, then you know the file was modified while you were downloading it. Therefore, integrity is lost.

The problem comes when you want to use hash values to sign a message. Also known as providing authentication.

Just to give an example, suppose you want to sign a message by using a shared secret key between you and a receiver. The message itself is not secret, you just need a mechanism for the receiver to be sure that it is you who are sending the message.

Your mechanism is to use a secret password. So, you have a message M, and you calculate the hash of that message using the secret password. Then you send the message M and the hash.

The receiver gets the message and a hash. Using the secret password (that was secretly shared between the two parties), create a hash for the message and compare it to the received hash. If they match, the message is valid.

See below a figure that shows the previous process.

Example of using a hash to authenticate.

What happens if an attacker intercepts the message M plus the hash created using the secret password? Let’s explain that below.

A graphical representation of the length extension attack

The figure below shows how the length extension attack works.

Here, an attacker that has the hash value of the message M, can use that value to calculate the hash of M+M’ (new hash value) without knowing the password used to create the hash of M.

Notice that the goal here is that when the receiver creates the hash of the new message M1 using the secret password and compares it with the hash received, find out that it is the same. Therefore, the message will be accepted as valid.

For the attacker to achieve that goal, he/she will use the fact that the hash value of message M (plus the password) also represents the internal state of the Initial Values when all blocks were processed.

So, the attacker will initialize the Initial Vector with the values of the hash and process the message M1 (controlled by the attacker) as if it is part of the original message.

When you use this mechanism, you don’t need to know the password because you have a hash that is constructed using that password and you can keep processing that hash using more blocks.

Let’s suppose we are sending a message to a person. The message is “I cannot buy the shoes you mentioned; I don’t have the money”. The sender uses a secret password to calculate the hash of the message and sends the hash so the receiver can authenticate the message.

An attacker, add extra text to the message as follows: “I cannot buy the shoes you mentioned; I don’t have the money. Can you send me 1000 to the account XXXXX?”. Also, the attacker creates a new hash using the length extension attack as shown in the figure above.

Can you see the problem?

MD5 length extension attack in Python

In this case, I’m using an MD5 python implementation. You can download it from here.

First, let’s create a function “verify” that prints if the message and hash received, are valid. The hash is valid if can be obtained by calculating the MD5 digest of the password plus the message.

def verify(message, hash):
    password = bytes("password", 'utf-8')
    print ('Message:')
    print(message)
    h3 = md5()
    h3.update(password + message)
    print('calculated hash: ' + h3.hexdigest())
    import hashlib
    hash_hashlib = hashlib.md5((password + message)).hexdigest()
    print('received hash:   ' + hash)
    print('Hash created \n with hashlib:   ' + hash_hashlib)
    print ("valid") if h3.hexdigest()==hash==hash_hashlib else print ("not valid")

All the print statements, except the last one, are optional. They are there just for your own convenience. Notice that we use the Python module hashlib to make sure we are creating the MD5 digest of the message. We use another Python implementation of MD5 because allows us to modify the internal state, one of the steps of the attack. The function verify represents the steps that the receiver follows when a new message arrives.

Now, the sender (that has the same secret password as the receiver) creates a message and a hash of that message using the password.

original_message = bytes('Open my account and read the information','utf-8')
def sender_calculate_hash():
    password = bytes("password", 'utf-8')
    h = md5()
    h.update(password+original_message )
    verify(original_message, h.hexdigest())
    return h.digest()

The code below shows the calculation of a hash for a certain message, named original_message in this case.

Now, suppose that an attacker intercepts the message and the hash. Let’s implement the length extension attack.

def attacker(message_len, password_len, message_hash):
    # Calculate how many bits were processed already
    mlen = message_len + password_len
    bits = (mlen + len(padding(mlen*8)))*8

    #Initialize the state with the result
    h1 = md5(state=message_hash, count=bits)

    # You can check if using h1, initilized with last state of h
    # give the same digest than h as follows
    #hash = _encode(h1.state, md5.digest_size)
    #print('hash: ' + hash.hex())

    attack = ', after that, you can transfer all the money to the account XXXX'
  
    h1.update(attack)
    new_hash = h1.hexdigest()

    print ('*'*50)
    mpadding = padding(mlen*8)
    new_message = original_message + mpadding + bytes(attack,'utf-8')
    verify(new_message, new_hash)

The attacker has access to the original message and the hash of that message. Therefore he/she knows the length of the message. It will also need to know the length of the password that the sender used. The length of the password can be calculated using brute force.

Also notice that the attacker does not know the password nor has access to it.

After executing the steps on the code above, the attacker has a new message and a new hash. The attacker sends them to the receiver.

When the receiver verifies the hash of the new message using the secret password, it will be the same that the attacker sent.

Attack complete!

How to prevent the length extension attack?

The only way to prevent this attack is not to use MD5 to implement Message Authentication Codes (MAC). In other words, don’t use MD5, SHA-1, or SHA-2 (except truncated versions of SHA-2) to authenticate messages.

Instead, you can use HMAC. This one is not vulnerable to the length extension attack.

Related posts:

MD5 length extension attack in Python

Table of Contents

Steps to do a length extension attack

Why creating a new message and a valid hash can be a problem?

A graphical representation of the length extension attack

MD5 length extension attack in Python

How to prevent the length extension attack?

Recent posts