Google announces the first SHA-1 Collision in a PDF document

Google has just demonstrated the first SHA-1 collision by showing two PDFs with the same unique identifier but different content, raising an interesting question about how cryptographic hash functions work in electronic signatures.

Making a Hash of it : hash functions in e-signature

Following the news from Google, our techies have been explaining to us mere mortals in the marketing department the technicalities of hash functions. We have attempted to digest this and evaluate the extent to which you, as an e-signature user, might take an (albeit passing) interest in cryptographic hash functionality… still with us?

Hash collisions and e-signatures

Google has just produced two different PDFs that have identical identifiers or “hash values.” In what they have termed a “hash collision.” What does that mean? What does it matter? And what is a hash collision anyway?

What is a hash?

A cryptographic hash function is a digital function that takes in data and spits out a unique code or message the other end, sometimes referred to as the “hash value”. If you know the original input data, you can prove that it was that particular input data that created that hash value (but not vice-versa). This unique identifier functionality is used every day across online applications such as online document filing systems and encrypted messaging applications.

So what is a hash collision?

The theory is that it is impossible for a hash-value to relate to more than one set of input data. It is this that the Google team have just proved wrong by generating a “hash collision” in “SHA-1,” one of the cryptographic algorithms used to create these unique identifiers. It has taken them two years to do so, and the result is their revealing two different sets of input data with identical SHA-1 hashes in the form of two PDF documents that have identical hashes but different content.

How does hash functionality apply to e-signature?

The quality of an e-signature solution goes to the extent to which it can prove three elements:-

Authentication: i.e. the document was signed (the data was created) by a known sender
Non-repudiation: i.e. the signer cannot deny having signed the document
Integrity: the data has not been altered in transit

Cryptographic hash functionality has clear application to electronic and digital signatures, the theory being that it enables you to prove the integrity and authenticity of the original data with that data. If the input data is changed in any way (i.e the document/certification is tampered with) you could no longer use that data or document to prove authentication or integrity.

The bigger picture

The tech community has for a while been moving away from the SHA-1 algorithm. We use the more robust SHA-256 encryption for all our transmissions, and the Legalesign platform ensures you get top-spec e-signature quality by offering Certified PDFs with Long Term Validation (something for another day!).

But to contextualise, the probability of the SHA-256 collision taking place is minuscule. This discussion on Stack Overflow contextualises it – our favourite is a statement claiming that a “mass-murderer space rock" (which happens every 30 million years on average) is 45 orders of magnitude more probable than the SHA-256 collision. So it may take the boffs at Google a little longer to break that!

More News

Google and the hash, unhashed. SHA-1 collision in PDFs now demonstrated.