Hacker News

Powered by HN Search API

Announcing the first SHA-1 collision

From https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
pfg | 2017-02-23 | 3030

Comments:

divbit

2017-02-23
So good timing to have just started working on a sha3 version of git I guess...

zurn

2017-02-23
Anyone have back of the envelope calculations for the cost of the CPU and GPU time?

ktta

2017-02-23
Here's a good blog about how SHA-1 works:

http://www.metamorphosite.com/one-way-hash-encryption-sha1-d....

The biggest risk I see with this is how torrents are affected:

https://en.wikipedia.org/wiki/Torrent_poisoning

There's also a problem with git, but I don't see it being that as susceptible as torrents:

http://stackoverflow.com/a/34599081/6448137

koolba

2017-02-23
What's the impact to something like git that makes extensive use of SHA-1?

In their example they've created two PDFs with the same SHA-1. Could I replace the blob in a git repo with the "bad" version of a file if it matches the SHA-1?

jasode

2017-02-23
>Nine quintillion computations; 6,500 years of CPU; 110 years of GPU

Is there a rough calculation in terms of today's $$$ cost to implement the attack?

Asdfbla

2017-02-23
Maybe just writing 2^63 would have been easier to interpret than that huge number in the context of cryptography. (Unless you assume this targets a non-technical audience, which I doubt.)

Pretty impressive, though. And worrying, because if Google can do it, you know that state-level actors have been probably doing it for some time now (if only by throwing even more computing power at the problem).

anilgulecha

2017-02-23
Big things affected:

* DHT/torrent hashes - A group of malicious peers could serve malware for a given hash.

* Git - A commit may be replaced by another without affecting the following commits.

* PGP/GPG -- Any old keys still in use. (New keys do not use SHA1.)

* Distribution software checksum. SHA1 is the most common digest provided (even MD5 for many).

Edit: Yes, I understand this is a collision attack. But yes, it's still a attack vector as 2 same blocks can be generated now, with one published, widely deployed (torrent/git), and then replaced at a later date.

necessity

2017-02-23
> If you use Chrome, you will be automatically protected from insecure TLS/SSL certificates, and Firefox has this feature planned for early 2017.

No need to wait. The option to reject SHA-1 certificates on Firefox is `security.pki.sha1_enforcement_level` with value `1`.

https://blog.mozilla.org/security/2016/01/06/man-in-the-midd...

Other configs worth doing:

`security.ssl.treat_unsafe_negotiation_as_broken` to `true` and `security.ssl.require_safe_negotiation` to `true` also. Refusing insecure algorithms (`security.ssl3.<alg>`) might also be smart.

mckoss

2017-02-23
Computing a collision today costs about $100K from my reading of the paper. So most uses of SHA1 are protecting documents of far lower value, and would not be likely attack targets (today).

RyanZAG

2017-02-23
Is a 30 day disclosure period really enough for something like this? It's obviously not possible to 'fix' big systems that rely on SHA-1 such as git or github in only 30 days. Hardware devices that use SHA-1 as a base for authenticating firmware updates?

rnhmjoj

2017-02-23
About tor: if an attacker produces a public key that collides with the SHA-1 hash of someone else's hidden service, then he would still need to generate the corresponding RSA-1024 private key, which is infeasible as of today.

Is this correct?

mabbo

2017-02-23
One practical attack using this: create a torrent of some highly desirable content- the latest hot TV show in high def or whatever. Make two copies, one that is malware free, another that isn't.

Release the clean one and let it spread for a day or two. Then join the torrent, but spread the malware-hosting version. Checksums would all check out, other users would be reporting that it's the real thing, but now you've got 1000 people purposely downloading ransomware from you- and sharing it with others.

Apparently it costs around $100,000 to compute the collisions, but so what? If I've got 10,000 installing my 1BTC-to-unlock ransomware, I'll get a return on investment.

This will mess up torrent sharing websites in a hurry.

Edit: some people have pointed out some totally legitimate potential flaws in this idea. And they're probably right, those may sink the entire scheme. But keep in mind that this is one idea off the top of my head, and I'm not any security expert. There's plenty of actors out there who have more reasons and time to think up scarier ideas.

The reality is, we need to very quickly stop trusting SHA1 for anything. And a lot of software is not ready to make that change overnight.

korm

2017-02-23
[2012] Schneier - When Will We See Collisions for SHA-1?

https://www.schneier.com/blog/archives/2012/10/when_will_we_...

Pretty close in his estimation.

icedchai

2017-02-23
> "10 years after of SHA-1 was first introduced"

wasn't SHA-1 introduced in the 90's?

jgrahamc

2017-02-23
How am I going to explain this to my wife?

Actually a serious question. How do we communicate something like this to the general public?

wickedlogic

2017-02-23
Would providing multiple SHA-1's from both the whole and N subsections (or defined regions) of the byte stream make this impractical... or at this point is the cost just going to drop and make this not relevant?

Like a NURBS based sudoku multi-hash...

cesarb

2017-02-23
On a quick scroll of the comments, I haven't seen this posted so far: http://valerieaurora.org/hash.html

We're at the "First collision found" stage, where the programmer reaction is "Gather around a co-worker's computer, comparing the colliding inputs and running the hash function on them", and the non-expert reaction is "Explain why a simple collision attack is still useless, it's really the second pre-image attack that counts".

pavfarb

2017-02-23
Now I really wonder what will happen to Git we all know and love.

m3ta

2017-02-23
To put things into perspective, let the Bitcoin network hashrate (double SHA256 per second) = B and the number of SHA1 hashes calculated in shattered = G.

B = 3,116,899,000,000,000,000

G = 9,223,372,036,854,775,808

Every three seconds the Bitcoin mining network brute-forces the same amount of hashes as Google did to perform this attack. Of course, the brute-force approach will always take longer than a strategic approach; this comment is only meant to put into perspective the sheer number of hashes calculated.

mikeash

2017-02-23
For those of us who are totally clueless about the construction of these hash functions, what is the fundamental flaw in SHA-1 that allows this attack? How do newer hash functions avoid it?

mtgx

2017-02-23
Never forget: when Facebook, Twitter, and Cloudflare tried to slow-down SHA-1 deprecation:

https://www.facebook.com/notes/alex-stamos/the-sha-1-sunset/...

https://blog.twitter.com/2015/sunsetting-sha-1

https://blog.cloudflare.com/sha-1-deprecation-no-browser-lef...

I think Microsoft tried to do it too early on, but eventually agreed to a more aggressive timeline.

Siecje

2017-02-23
Is Mercurial impacted?

0xcde4c3db

2017-02-23
If you trust a signer, does this attack do anything to invalidate their SHA-1-based signatures? Or is the scenario strictly an attacker generating both versions of the message?

sah2ed

2017-02-23
> "Today, 10 years after of SHA-1 was first introduced, ..."

That part from the original article seems to be missing something?

Aissen

2017-02-23
I love the fact that there is a tool for detecting any collision using this algorithm: https://github.com/cr-marcstevens/sha1collisiondetection

and it's super effective: The possibility of false positives can be neglected as the probability is smaller than 2^-90.

It's also interesting that this attack is from the same author that detected that Flame (the nation-state virus) was signed using an unknown collision algorithm on MD5 (cited in the shattered paper introduction).

0x0

2017-02-23
I'm trying to play with this in git. Added the first file, committed, and then overwrote the file with the second file and committed again. But even when cloning this repository into another directory, I'm still getting different files between commit 1 and 2. What does it take to trick git into thinking the files are the same? I half expected "git status" to say "no changes" after overwriting the first (committed) pdf with the second pdf?

SadWebDeveloper

2017-02-23
It's still quite impractical, m sure with some quantum computer or a custom ASIC built by those "super nerds" at the NSA its possible but but for you general adversary aka "hackers" (skiddies IMHO) it will be infeasible.

What this means is for all of you [developers], is to start new projects without SHA1 and plan on migrating old ones (if it's totally necessary, normally don't unless you use SHA1 for passwords).

A Great resource for those who still don't know how or what hash to use, is paragonie: https://paragonie.com/blog/2016/02/how-safely-store-password...

SamBam

2017-02-23
I'm confused by the "File Tester" at https://shattered.it/

It says "Upload any file to test if they are part of a collision attack."

When I upload either of their two sample collision documents, it says they are "Safe."

Aissen

2017-02-23
Anyone good enough in AWS pricing can reproduce the $100k pricing for one collision ? Using EC2 g2.xlarge instances I'm more at $2.8M.

tjbiddle

2017-02-23
It should also be noted that their examples files also have the same file size, in this case 422435 bits, after creating the collision - which I find fascinating!

imron

2017-02-23
I wonder if there are any 2 single commits on Github from different repositories that have the same SHA1 hash.

kazinator

2017-02-23
> In practice, collisions should never occur for secure hash functions.

That is mathematically impossible when reducing an N bit string to an M bit string, where N > M.

All hashes have collisions; it's just how hard are they to find.

e0m

2017-02-23
10 million GPUs is not insane when you have a billion dollar security cracking infrastructure budget. Especially when you compare it to the rest of the cyber warfare budget.

manwithaplan

2017-02-23
Seeing how they ridicule MD5, I think they should have spent a bit more time on the proof PDFs, and have their MD5 digests collide also.

amichal

2017-02-23
Linked http://shattered.io/ has two PDFs that render differently as examples. They indeed have same SHA-1 and are even the same size.

  $ls -l sha*.pdf 
  -rw-r--r--@ 1 amichal  staff  422435 Feb 23 10:01 shattered-1.pdf
  -rw-r--r--@ 1 amichal  staff  422435 Feb 23 10:14 shattered-2.pdf
  $shasum -a 1 sha*.pdf
  38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-1.pdf
  38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-2.pdf
Of course other hashes are different:

  $shasum -a 256 sha*.pdf

  2bb787a73e37352f92383abe7e2902936d1059ad9f1ba6daaa9c1e58ee6970d0  shattered-1.pdf
  d4488775d29bdef7993367d541064dbdda50d383f89f0aa13a6ff2e0894ba5ff  shattered-2.pdf 

  $md5 sha*.pdf
  
  MD5 (shattered-1.pdf) = ee4aa52b139d925f8d8884402b0a750c
  MD5 (shattered-2.pdf) = 5bd9d8cabc46041579a311230539b8d1

ratstew

2017-02-23
I got a chuckle out of the binary diff. :)

http://i.imgur.com/OmFHELl.png

yeukhon

2017-02-23
How do you actually create a collision? The paper is beyond my level of comprehensions. Are we going to see someone writing up an open source tool to allow one to generate another file with the same hash?

pix64

2017-02-23
Is there any merit to using two hashing algorithms simultaneously particularly if the algorithms are very different in nature?

mate_soos

2017-02-23
I am a bit saddened that Vegard Nossum's work, which they used for encoding SHA-1 to SAT, is only mentioned as a footnote. The github code is at

https://github.com/vegard/sha1-sat

and his Master Thesis, whose quality is approaching a PhD thesis is here:

https://www.duo.uio.no/bitstream/handle/10852/34912/thesis-o...

Note that they also only mention MiniSat as a footnote, which is pretty bad. The relevant paper is at

http://minisat.se/downloads/MiniSat.pdf

All of these are great reads. Highly recommended.

ianaphysicist

2017-02-23
This is one of the reasons it is important to have multiple hash algorithms in use. Even when a collision can be triggered in two systems, it becomes markedly harder to trigger a simultaneous collision in other systems at that same point (payload).

jmartinpetersen

2017-02-23
Is it coincidental that this GPUs on Compute Enginge were announced recently? This seems like a nice burn-in test and it being completed should free up ressources.

jwilk

2017-02-23
> How did you leverage the PDF format for this attack?

> A picture is worth a thousand words, so here it is.

> http://shattered.io/static/pdf_format.png

This picture is meaningless to me. Can someone explain what's going on?

polynomial

2017-02-23
It appears they are using a 2^63 hash operation attack that has been well known for nearly a decade. (Brute force of SHA-1 is 2^69.)

I wonder why they did not use the 2^52 operation attack that Schneier noted in 2009?

https://www.schneier.com/blog/archives/2009/06/ever_better_c...

jqueryin

2017-02-23
What's funny is Google still promotes SHA-1 in some of their APIs: https://developers.google.com/admin-sdk/directory/v1/guides/...

bch

2017-02-23
I wish there were sample documents, but if one had two computed hashes would this mitigate this SHA1-shattered flaw ? e.g. good_doc.pdf sha1=da39a3ee5e6b4b0d3255bfef95601890afd80709, md5=d41d8cd98f00b204e9800998ecf8427e ? With the sample project I'm looking at (GraphicsMagick) on Sourceforge for example, it provides both SHA-1 and MD5 hashes...

jeffdavis

2017-02-23
Does git have any path away from SHA1?

I know the attack isn't practical today, but the writing is on the wall.

goncalomb

2017-02-23
https://security.googleblog.com : "This website uses a weak security configuration (SHA-1 signatures), so your connection may not be private."

rodionos

2017-02-23

lisper

2017-02-23
This point seems to be getting re-hashed (no pun intended) a lot, so here's a quick summary: there are three kinds of attacks on cryptographic hashes: collision attacks, second-preimage attacks, and first-preimage attacks.

Collision attack: find two documents with the same hash. That's what was done here.

Second-preimage attack: given a document, find a second document with the same hash.

First-preimage attack: given an arbitrary hash, find a document with that hash.

These are in order of increasing severity. A collision attack is the least severe, but it's still very serious. You can't use a collision to compromise existing certificates, but you can use them to compromise future certificates because you can get a signature on one document that is also valid for a different document. Collision attacks are also stepping stones to pre-image attacks.

UPDATE: some people are raising the possibility of hashes where some values have 1 or 0 preimages, which makes second and first preimage attacks formally impossible. Yes, such hashes are possible (in fact trivial) to construct, but they are not cryptographically secure. One of the requirements for a cryptographically secure hash is that all possible hash values are (more or less) equally likely.

nneonneo

2017-02-23
The visual description of the colliding files, at http://shattered.io/static/pdf_format.png, is not very helpful in understanding how they produced the PDFs, so I took apart the PDFs and worked it out.

Basically, each PDF contains a single large (421,385-byte) JPG image, followed by a few PDF commands to display the JPG. The collision lives entirely in the JPG data - the PDF format is merely incidental here. Extracting out the two images shows two JPG files with different contents (but different SHA-1 hashes since the necessary prefix is missing). Each PDF consists of a common prefix (which contains the PDF header, JPG stream descriptor and some JPG headers), and a common suffix (containing image data and PDF display commands).

The header of each JPG contains a comment field, aligned such that the 16-bit length value of the field lies in the collision zone. Thus, when the collision is generated, one of the PDFs will have a longer comment field than the other. After that, they concatenate two complete JPG image streams with different image content - File 1 sees the first image stream and File 2 sees the second image stream. This is achieved by using misalignment of the comment fields to cause the first image stream to appear as a comment in File 2 (more specifically, as a sequence of comments, in order to avoid overflowing the 16-bit comment length field). Since JPGs terminate at the end-of-file (FFD9) marker, the second image stream isn't even examined in File 1 (whereas that marker is just inside a comment in File 2).

tl;dr: the two "PDFs" are just wrappers around JPGs, which each contain two independent image streams, switched by way of a variable-length comment field.

yuhong

2017-02-23
identical prefix, not chosen prefix. I was more interested in an SHA-1 collision ASIC.

Zhenya

2017-02-23
It's telling/ironic that the google security blog will not display the text without javascript enabled.

mrb

2017-02-23
Related: someone claimed a 2.48 BTC (~2800 USD) bounty by using this SHA1 collision data: https://news.ycombinator.com/item?id=13714987

donatj

2017-02-23
So from a security standpoint if my hash was a sha-1 concatenated to an MD5, how long would it be before they found a collision?

Phithagoras

2017-02-23

mysterydip

2017-02-23
Forgive my ignorance, but it seems a solution to collision worries is to just use two hashing algorithms instead of one. We have two factor authentication for logins, why not the equivalent for hashed things?

Give me the sha1 and md5, rather than one or the other. Am I wrong in thinking even if one or both are broken individually, having both broken for the same data is an order of magnitude more complex?

orasis

2017-02-23
" Today, 10 years after of SHA-1 was first introduced, we are announcing the first practical technique for generating a collision."

Huh? It's been around a lot longer than 10 years.

Lord_Yoda

2017-02-23
As I consumer how do I identify sites which are vulnerable? What should I do to protect my data?

matt_wulfeck

2017-02-23
> We then leveraged Google’s technical expertise and cloud infrastructure to compute the collision which is one of the largest computations ever completed.

And this, my friends, is why the big players (google, Amazon, etc) will win at the cloud offering game. When the instances are not purchased they can be used extensively internally.

korethr

2017-02-23
So, since Git uses SHA-1, does this mean we're going to see a new major version number of Git that uses SHA-2 or SHA-3 in a few years?

I don't expect one overnight. For one, as noted, this is a collision attack, one which took a large scale of power to achieve. In light of that, I don't think the integrity of git repos is in immediate danger. So I don't think it'd be an immediate concern of the the Git devs.

Secondly, wouldn't moving to SHA-2 or SHA-3 be a compatibility-breaking change? I'd think that would be painful to deal with, especially the larger the code base, or the more activity it sees. Linux itself would be a worst-case scenario in that regard. But, it can be pulled off for Linux, then I'd think any other code base should be achievable.

wfunction

2017-02-23
It's kind of odd that over 9 months ago it was known that Microsoft would stop honoring SHA-1 certificates starting from 1 week ago. Anyone know if this is just a pure coincidence? See https://blogs.windows.com/msedgedev/2016/04/29/sha1-deprecat...

bobbyyankou

2017-02-23
Can someone help me understand what the major distinction is between this accomplishment (SHAttered) and the same team's The SHAppening (2015)?

It looks like the did the same thing or something similar in 2^57.5 SHA1 calculations back then versus 2^63 SHA1 calculations this time.

kyleblarson

2017-02-23
What's the over/under on how long ago the NSA accomplished this?

madhorse

2017-02-23
"But MD5 is still okay right?" -Laravel Developer

bekimdisha

2017-02-23
but ... but .... half of the web is no longer secure? ... :D

macawfish

2017-02-23
So is this why Google asked me to type in my password this afternoon? Cause I was kinda cautious about that, but still did it...

wapz

2017-02-23
I don't know too much about hashing/encryption, but if you salt the sha-1 will this still be able to find a collision?

dingo_bat

2017-02-23
> Hash functions compress large amounts of data into a small message digest.

My understanding of crypto concepts is very limited, but isn't this inaccurate? Hash functions do not compress anything.

They have an image too which says "<big number> SHA-1 compressions performed".

Seems weird to see basic mistakes in a research disclosure.

userbinator

2017-02-23
It's interesting to note that when the first MD5 collisions were discovered a bit over a decade ago, they were computed by hand calculation. Next came the collision generators like HashClash/fastcoll (remember these?) which could generate colliding MD5 blocks within a few seconds on hardware of the time. I wonder how long it will be before the same can be done for SHA-1, because it seems here that they "simply" spent a large amount of computing power to generate the collision, but I'm hopeful that will be reduced very soon.

As for what I think in general about it: I'm not concerned, worried, or even scared about the effects. If anything, inelegance of brute-force aside, I think there's something very beautiful and awe-inspiring in this discovery, like solving a puzzle or maths conjecture that has remained unsolved for many years.

I remember when I first heard about MD5 and hash functions in general, and thinking "it's completely deterministic. The operations don't look like they would be irreversible. There's just so many of them. It's only a matter of time before someone figures it out." Then, years later, it happened. It's an interesting feeling, especially since I used to crack softwares' registration key schemes which often resembled hash functions, and "reversing" the algorithms (basically a preimage attack) was simply a matter of time and careful thought.

There's still no practical preimage for MD5, but given enough time and interest... although I will vaguely guess that finding SHA-256 collisions probably has a higher priority to those interested.

mrybczyn

2017-02-23
SHA-1 isn't broken until someone makes a multi step quine that hashes to the same value at every stage!

BTW quine relay is impressive: https://github.com/mame/quine-relay

nurettin

2017-02-23
"as google, we spent two years to research a way of generating sha-1 collisions and made quintillions of computations to generate an example" <- not very convincing or practical. It's like those japanese animes where the nerdy kid boasts about having computed your every move.

RealNeatoDude

2017-02-23
> Google has advocated the deprecation of SHA-1 for many years, particularly when it comes to signing TLS certificates.

Why? Was it in anticipation of this attack specifically?