Digital megabreaches have lately become so commonplace as to be almost indistinguishable on the alarm scale---a hundred million passwords stolen from one social media service one day, a few hundred million more the next. It all becomes a depressing blur. But not all password disasters are equally disastrous. And the difference between a Three Mile Island and a Hiroshima sometimes comes down to an arcane branch of cryptography: hashing.
When hackers compromise a company to access its collection of users’ passwords, what they find and steal isn’t stored in a form that’s readable by humans---at least if the company has even a pretense of security. Instead, the cache of passwords is often converted into a collection of cryptographic hashes, random-looking strings of characters into which the passwords have been mathematically transformed to prevent them from being misused. This transformation is called hashing. But just what sort of hashing those passwords have undergone can mean the difference between the thieves ending up with scrambled text that takes years to decipher or successfully “cracking” those hashes in days or hours to convert them back to usable passwords, ready to access your sensitive accounts.
A hash is designed to act as a "one-way function": A mathematical operation that's easy to perform, but very difficult to reverse. Like other forms of encryption, it turns readable data into a scrambled cipher. But instead of allowing someone to decrypt that data with a specific key, as typical encryption functions do, hashes aren't designed to be decrypted. Instead, when you enter your password on a website, it simply performs the same hash again and checks the results against the hash it created of your password when you chose it, verifying the password's validity without having to store the sensitive password itself.
"A hash usually takes an input, does something with it, and what comes out looks like random data," says Jens "Atom" Steube, the creator of the popular hash-cracking software Hashcat. "When you input the same data again, the data that comes out will be exactly the same. And that's how a service knows that the input was correct."
In theory, no one, not a hacker or even the web service itself, should be able to take those hashes and convert them back into passwords. But in practice, some hashing schemes are significantly harder to reverse than others. The collection of 177 million LinkedIn accounts stolen in 2012 that went up for sale on a dark web market last week, for instance, had actually been hashed. But the company used only a simple hashing function called SHA1 without extra protections, allowing almost all the hashed passwords to be trivially cracked. The result is that hackers were able to not only access the passwords, but also try them on other websites, likely leading to Mark Zuckerberg having his Twitter and Pinterest accounts hacked over the weekend.
By contrast, a breach at the crowdfunding site Patreon last year exposed passwords that had been hashed with a far stronger function called bcrypt, the fact of which likely kept the full cache relatively secure in spite of the breach. That's according to Rick Redman, a penetration tester at the firm KoreLogic who runs a password-cracking competition at the annual Defcon hacker conference. "The strength of the hash is the insurance policy. It tells you how much time users have to change their passwords after a data breach before they come to harm," Redman says. "If it’s just SHA1, there is no window...If it’s bcrypt, you have time to run away and change all your passwords."
To see the difference between those hashing schemes, consider how password hash-cracking works: Hackers can't reverse a hashed password created with a function like SHA1. But they can simply try guessing passwords and running them through the same function. When they find a matching hash, they know they've hit on the right password. A hash-cracking program working on a large database of hashes can guess many millions or billions of possible passwords and automatically compare the results with an entire collection of stolen hashed passwords to find matches.
"What a password cracker does is not black magic. It does the same thing as the legitimate login system," says Hashcat creator Steube. "It computes the hash of some input and compares the garbage that comes out [to a hash.] If it matches, the password was correct. The more often it can do that, the higher your chances are to find the password."
Hash-cracking schemes have for decades been locked in a cat-and-mouse game with the security community's attempts to make hashing more secure. Switching from normal computer processors or CPUs to graphics processors or GPUs allowed password crackers to exploit those chips' ability to perform many simple tasks in parallel, accelerating their guessing as much as a thousandfold. Hash-crackers have developed so-called "rainbow tables," immense lists of pre-computed hashes for every possible password. And modern password crackers don't merely guess passwords at random, but use "dictionary attacks" to cycle through real words, collections of known common passwords from past breaches, and to use statistical analyses of those passwords to find patterns that allow new passwords to be guessed faster. (LinkedIn's 177 million passwords will no doubt give password crackers plenty of new material to study for developing future hash-cracking techniques.)
The security world has responded with its own tricks to slow, if not altogether stop, password hash-cracking. To prevent pre-computation, hashing schemes now use a trick called "salting," adding random data to a password before hashing it and then storing that "salt" value along with the hash. (LinkedIn didn't even go that far with its 2012 password collection.) And modern hashing techniques like bcrypt and Argon2 don't simply run a password through a function like SHA1, but do so thousands of times, rehashing the resulting data again and again. Those functions require that data is stored in memory and then accessed again, creating a bottleneck: the work can't be split into many parallel tasks by a GPU without having to access memory at each step.
After a password breach hits, it's difficult to determining how securely the stolen passwords were hashed. Companies rarely reveal what hashing functions they've used. And even when they have, leaked passwords can be more vulnerable than they seem: Hacked hookup site Ashley Madison's collection of 36 million leaked passwords were hashed with bcrypt, but 15 million were also somehow stored on the company's servers with much weaker hashing, allowing crackers to derive 11 million of the passwords in days compared to the decades bcrypt would require.
All of that means your passwords' security still depends mostly on you. Use complex, hard-to-guess passwords (or better yet, passphrases or random strings chosen by a password manager application) that won't be quickly guessed by hash-cracking programs. Don't reuse passwords between services, which could endanger all your accounts if a single one is successfully cracked. And regardless of what a company says after it reveals a security breach, change your password. Hashing schemes are clever. But don't bet your security that hash-crackers won't outsmart them.