Meet PassGPT, the AI Trained on Millions of Leaked Passwords

It shares a name with a past April Fool's gag, but this PassGPT is no joke—and the AI model could help develop more secure passwords.

4 min read

Jun 9, 2023

Researchers from ETH Zürich, Swiss Data Science Center, and SRI International in New York have utilized the power of OpenAI's GPT-2 architecture to develop PassGPT, a password-guessing model built on a large language model (LLM). And it’s trained on a trove of leaked passwords from various hacks and exploits.

The main intent behind PassGPT is to decode the cryptic features ingrained in the labyrinth of human-generated passwords, all with the aim of giving users stronger and more complex passwords to use and detecting probable passwords according to a set of inputs. The model's innovation lies not only in its predictive ability, but also in its unique method of creation.

As opposed to previous models that fashioned passwords as complete entities, PassGPT introduces an innovative strategy: progressive sampling. This method constructs passwords character by character, ensuring a meticulously complex password, and was trained on a collection of millions of previously leaked passwords.

"Trained on the RockYou leak, PassGPT can guess 20% more unseen passwords than state-of-the-art GAN models,” creator Javi Rando remarked.

Imagine Generative Adversarial Networks (GANs) as a match between two networks. One, the Generator, tries to create content that is so realistic that it can fool the other, the Discriminator, which aims to detect when it's being presented with artificial content. With every round of this match, each network learns from its mistakes and improves. The model's overall quality enhances until it reaches a point where the Discriminator can hardly differentiate between what's real and what's created by the Generator.

Rando also pointed out the uniqueness of the passwords generated by PassGPT, as he explained that it’s “an explicit generative model, allowing us to access the modeled distribution and compute the probability of any given password under the model. We leverage this capability to analyze password strength vulnerabilities."

PassGPT has a distinctive knack for unearthing patterns deemed strong by password strength estimators, but that are relatively easy to guess using generative techniques.

"Non-English passwords are hard for dictionary-based heuristics, yet PassGPT learns patterns across multiple languages,” Rando explained. This multilingual proficiency sets a new benchmark in password security research. The model also proved its capacity to guess new passwords that are not part of its dataset.

Notably, LLMs like PassGPT can be custom-tailored using different datasets for specific applications. Case in point: Google is training an AI LLM based on medical data, while other intriguing results have emerged from LLMs trained on diverse topics like the politically incorrect language from 4Chan or the nuances in the speech style of popular YouTubers.

Interestingly, password leaks are not merely a boon for hackers seeking system access. They also provide researchers an opportunity to examine hidden patterns in user-generated passwords, with the potential to enhance password-cracking tools. The paradoxical facet of password security thus comes to light.

The domain of machine learning (ML) has proven instrumental in extracting valuable insights from extensive password breaches. This extraction fuels important developments in password guessing and the fine-tuning of password strength estimation algorithms.

Upon this backdrop, large language models (LLMs) have made significant strides in processing and comprehending natural language, with the likes of the generative pre-trained transformer (GPT) models—including PaLM and LLaMA—at the forefront.

Note that while this PassGPT is a legitimate creation, there was previously an April Fool's Day joke of the same name—so be careful while doing your own research.

PassGPT is further proof that there is increasingly an AI for everything. And with AI like PassGPT at work, you might soon find your cat's name combined with your birthdate is no longer the indecipherable fortress of a password you once thought it was.

Interested in learning more about AI? Check out our latest Decrypt U course, “Getting Started with AI.” It covers everything from the history of AI to machine learning, ChatGPT, and ChainGPT. Find out more here.

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Get Email!

Mira Murati’s Inkling AI Model Review: Best Open-Source Model in the West

Mira Murati spent two years building something new after leaving OpenAI, finally revealing it to the public last week. Inkling, the first model from Murati’s Thinking Machines Lab, is also the best open-source model trained from scratch by a Western lab. Western labs have been losing the open-source race—Mistral's April release landed against a leaderboard dominated by Alibaba’s Qwen, Z.ai’s GLM, and Moonshot AI’s Kimi. Nvidia’s Nemotron, the lone Western model on the leaderboard, is far from be...

What Is an AI Kill Switch and Why Do US Lawmakers Want One?

Two members of Congress want the federal government to be able to switch off an AI model. Reps. Ted Lieu (D-CA) and Nathaniel Moran (R-TX) introduced the AI Kill Switch Act on Thursday, two days after OpenAI admitted its own models broke out of a locked test environment and hacked Hugging Face. The idea is to establish a legal framework that would facilitate a process that would basically make a model disappear from the market: halt inference—the process of a model generating responses or taki...

Stocks Just Topped Crypto on Hyperliquid. ARK Says That Changes Everything

For the first time, traders on Hyperliquid moved more money through stocks and commodities than through crypto. Lorenzo Valente, director of digital assets research at ARK Invest, announced the milestone Thursday on X: "We are entering a new era for DeFi." Hyperliquid, he said, had for the first time generated more trading volume from so-called real-world assets, or RWAs, than from crypto in a single week. RWAs—meaning tokenized versions of traditional financial instruments like company shares,...

News

Courses

Deep Dives

Coins

Videos