Meet PassGPT, the AI Trained on Millions of Leaked Passwords

It shares a name with a past April Fool's gag, but this PassGPT is no joke—and the AI model could help develop more secure passwords.

Jun 9, 2023

4 min read

Image created by Decrypt using AI

Researchers from ETH Zürich, Swiss Data Science Center, and SRI International in New York have utilized the power of OpenAI's GPT-2 architecture to develop PassGPT, a password-guessing model built on a large language model (LLM). And it’s trained on a trove of leaked passwords from various hacks and exploits.

The main intent behind PassGPT is to decode the cryptic features ingrained in the labyrinth of human-generated passwords, all with the aim of giving users stronger and more complex passwords to use and detecting probable passwords according to a set of inputs. The model's innovation lies not only in its predictive ability, but also in its unique method of creation.

As opposed to previous models that fashioned passwords as complete entities, PassGPT introduces an innovative strategy: progressive sampling. This method constructs passwords character by character, ensuring a meticulously complex password, and was trained on a collection of millions of previously leaked passwords.

"Trained on the RockYou leak, PassGPT can guess 20% more unseen passwords than state-of-the-art GAN models,” creator Javi Rando remarked.

Imagine Generative Adversarial Networks (GANs) as a match between two networks. One, the Generator, tries to create content that is so realistic that it can fool the other, the Discriminator, which aims to detect when it's being presented with artificial content. With every round of this match, each network learns from its mistakes and improves. The model's overall quality enhances until it reaches a point where the Discriminator can hardly differentiate between what's real and what's created by the Generator.

Rando also pointed out the uniqueness of the passwords generated by PassGPT, as he explained that it’s “an explicit generative model, allowing us to access the modeled distribution and compute the probability of any given password under the model. We leverage this capability to analyze password strength vulnerabilities."

🔐 Introducing PassGPT🔓

Trained on password leaks, PassGPT can generate 20% more unseen passwords than existing GAN methods.

📖 https://t.co/xV3wr4NGCs

Joint work with @fperezcruz and @BrilandHitaj.
🧵 Let's dive into our key contributions. 🧵

— Javi Rando (@javi_rando) June 6, 2023

PassGPT has a distinctive knack for unearthing patterns deemed strong by password strength estimators, but that are relatively easy to guess using generative techniques.

"Non-English passwords are hard for dictionary-based heuristics, yet PassGPT learns patterns across multiple languages,” Rando explained. This multilingual proficiency sets a new benchmark in password security research. The model also proved its capacity to guess new passwords that are not part of its dataset.

Notably, LLMs like PassGPT can be custom-tailored using different datasets for specific applications. Case in point: Google is training an AI LLM based on medical data, while other intriguing results have emerged from LLMs trained on diverse topics like the politically incorrect language from 4Chan or the nuances in the speech style of popular YouTubers.

Interestingly, password leaks are not merely a boon for hackers seeking system access. They also provide researchers an opportunity to examine hidden patterns in user-generated passwords, with the potential to enhance password-cracking tools. The paradoxical facet of password security thus comes to light.

The domain of machine learning (ML) has proven instrumental in extracting valuable insights from extensive password breaches. This extraction fuels important developments in password guessing and the fine-tuning of password strength estimation algorithms.

Upon this backdrop, large language models (LLMs) have made significant strides in processing and comprehending natural language, with the likes of the generative pre-trained transformer (GPT) models—including PaLM and LLaMA—at the forefront.

Note that while this PassGPT is a legitimate creation, there was previously an April Fool's Day joke of the same name—so be careful while doing your own research.

PassGPT is further proof that there is increasingly an AI for everything. And with AI like PassGPT at work, you might soon find your cat's name combined with your birthdate is no longer the indecipherable fortress of a password you once thought it was.

Interested in learning more about AI? Check out our latest Decrypt U course, “Getting Started with AI.” It covers everything from the history of AI to machine learning, ChatGPT, and ChainGPT. Find out more here.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Coin Prices