Recently, researchers from The New York Institute of Technology and the Stevens Institute of Technology computer science department released a paper about using a relatively new machine learning technique to make computers 18-24 percent better at guessing your passwords than ever before. In this article, we’ll covert:
- The historical context in which this technology lives
- What this technology could be used for by criminals
- How this technology works
- How to use it if you’re a red-teamer
- How to protect yourself as a user
- How to protect yourself as a blue-teamer in charge of an enterprise
First, let's talk about what we mean when we talk about “guessing passwords,” as that’s a fairly nebulous term. In this case, we’re talking about cracking password hashes offline. That may leave you asking “What are password hashes?” A fair question.
What are Password Hashes?
When you see large data breaches like Dropbox, LinkedIn, Ashley Madison, etc., what was released was a list of (typically) emails and password hashes. The point of a hash in a security context is that its input cannot be determined by its output. For example, say I give you “5f4dcc3b5aa765d61d8327deb882cf99.” That’s the md5 hash of “password.” You cannot determine that 5f4dcc3b5aa765d61d8327deb882cf99 is the hash of password unless you try hashing “password” and see that the hash to “password” is 5f4d...cf99. There are no shortcuts, and you cannot go backwards. This is useful in security because it means that a site like Adobe or LinkedIn or Google can have hundreds of millions of user accounts but not store anyone's actual password. Even so, they can still check if you know your password even if they don’t know it. This is achieved by only storing the hash of a password instead of the password itself. Then, when a user wants to log in, they send their password that the system then hashes and checks if the hash matches the hash in the database associated with that email address. This slows down attackers, but it does not stop them entirely. Using modern tools, attackers can (depending on what hashing algorithm they are trying to crack) guess hundreds of thousands to hundreds of millions of passwords per second. This leads to attackers having a higher capacity to check a potential password than they have passwords to guess.
PassGAN Enters the Fray
Attackers and security professionals are in a never-ending race to come up with better and better ways of generating lists of potential passwords to check. PassGAN (the technology I mentioned earlier) is, at the moment, the newest step in this journey. Using a relatively new technology called a Generative Adversarial Network (GAN) PassGAN is able to generate 18-24 percent more correct password guesses than conventional methods. A GAN, specifically the type that PassGAN uses, can best be understood (simply) as two programs that work together. The first program is called the discriminator. It’s a Deep Convolutional Neural Network (in short, a system that learns patterns at increasingly abstracted levels) that is fed millions of passwords. It learns patterns within the dataset and is then able to tell how similar to a password a given string is. It ultimately returns a number between 0 and 1 where 0 is not password like and 1 is very much like a password. The next program is called the generator, and it uses the discriminator to generate passwords. It starts with a random string of text and fetches its score from the discriminator. The first time the generator checks in with the discriminator, it will likely get a very low score because it’s just a random string of characters. Next, the generator network starts to modify the string. It will change part of it and then check to see what score that string receives. If the score goes up, the change is kept; if not, it is undone and a new change is made. This process is repeated until the score reaches a given threshold. The result of this is computing power and a list of known passwords go in while new, relatively accurate potential passwords come out.
How Good and Bad Actors Can Use PassGan
As with any innovation in security, PassGAN likely will be used by bad actors for attacking said systems using password reuse campaigns, as is discussed in this podcast episode. Password reuse poses some danger to normal web users. That being said, the main thing that you will want to have in order to protect yourself from people stealing your accounts is having not just a password in between an attacker and your accounts. The best way to do this is using multi-factor authentication (MFA). Many sites offer this, if you’d like to see a list of them and search for sites you use, check out twofactorauth.org Of course, PassGAN will also be useful for red teamers who are trying to crack local passwords, such as windows SAM hashes or linux /etc/shadow password hashes, with the purpose of strengthening organizations’ security. If you’re a red teamer and are interested in using this, there’s an open source implementation of PassGAN on GitHub courtesy of Brannon Dorsey that can be found here https://github.com/brannondorsey/PassGAN. When doing this, be aware that in order to achieve the 18-24 percent increase mentioned above, the researchers did the following: PassGAN’s output was comparable to (in the case of HashCat), or better than (in the case of John the Ripper) that of password rules; (3) our results also show that PassGAN can be used to complement password generation rules. In our experiments, we successfully used PassGAN to generate password matches that were not generated by any password rule. When we combined the output of PassGAN with the output of HashCat, we were able to match between 18 percent and 24 percent additional unique passwords compared to HashCat alone; and (4) in contrast with password generation rules, PassGAN can generate a practically unbounded number of password guesses. Our experiments show that the number of new (unique) password guesses increases steadily with the overall number of passwords generated by the GAN. This is important because currently the number of unique passwords generated using rules is ultimately bounded by the size of the password dataset used to instantiate these rules. It may also be useful to run the output of PassGAN through a standard mutation-based password rule set for greater coverage. If you want to get more into the technical details and how to implement PassGan, check out this highlighted version of the original paper.
Closing Password Best Practices for Enterprises
If you manage an enterprise, you should enforce strong password policies and try to move away from passwords and towards passphrases. The benefit of a passphrase over a password is that you have a much lower ratio of entropy to memory cost for the user. When coming up with a password or telling people how to come up with a password, you want as much entropy (possible combinations) as possible at the lowest cost of memorization to the user. Take the following password, for example, *dJoeo30(#JS)3%$. This is a great password from an entropy standpoint. It has everything. It’s 16 characters, and it has uppercase letters, lowercase letters, symbols, and numbers. Each character could be one of 95 possible characters. This means there are 95^16 or a 4 with 31 zeros possible things this password could be. Even at 1 trillion guesses a second, this would still take a trillion years to guess. This is great, but nobody can remember it. This is because our brains simply aren’t made to remember random strings of symbols that don’t mean anything to us. This is where a lot of the issues that PassGAN exploits come from. Humans make passwords that are easy for humans to remember, which makes them more predictable. That’s especially true when you’re trying to make something easy to remember that fits a given criteria like:
This results in passwords like P@a$$wordP@a$$word1. Easy to remember, easy to guess, but at least it checks every box. What’s much more useful is something that has a lot of entropy that is naturally easy to remember. This is where the passphrase comes in. Instead of using arbitrary symbols for which our brains have no natural infrastructure to easily remember, we can use things with lots of entropy that our brains are great at remembering: sentences. For example, “the album was good the weather was cold.” That’s so easy to remember that you’ve probably already unintentionally remembered it. This has 1.18 x 10^69 (a 1 with 69 zeros after it) possible things that it could be if you just guess random letters. If we assume an attacker knows the person is using a passphrase of english words and we assume that the average person has a vocabulary of 20,000 words), we have 20,000^8 (2 with 34 zeros) possible combinations. Passphrases give us a massive amount of entropy at a low human cost of memorization. Next, consider how users come to have a given password in your enterprise. Should you expect your front desk person to understand how to create a safe password? Should you expect everyone to always pick a good password? If you’re giving users their passwords, are you giving them passwords they’re going to remember or are they going to write it on a Post-It note? Perhaps one may want to consider picking passwords for users and using passphrases.
About the Author: Nick McKenna is a student researcher who has had an interest in cyber security for the past five years. Nick likes seeing how things work and trying to break them. If you have any questions, you can contact Nick here. Editor’s Note: The opinions expressed in this guest author article are solely those of the contributor, and do not necessarily reflect those of Tripwire, Inc.