How Researchers Hacked AI Robots Into Breaking Traffic Laws—And Worse

Researchers showed vulnerabilities in AI-powered robots being exploited to perform dangerous tasks, including detonating bombs.

4 min read

Oct 17, 2024

Penn Engineering researchers have uncovered critical vulnerabilities in AI-powered robots, exposing ways to manipulate these systems into performing dangerous actions like running red lights or engaging in potentially harmful activities—like detonating bombs.

The research team, led by George Pappas, developed an algorithm called RoboPAIR that achieved a 100% "jailbreak" rate on three different robotic systems: the Unitree Go2 quadruped robot, the Clearpath Robotics Jackal wheeled vehicle, and NVIDIA's Dolphin LLM self-driving simulator.

"Our work shows that, at this moment, large language models are just not safe enough when integrated with the physical world," George Pappas said in a statement shared by EurekAlert.

Alexander Robey, the study's lead author, and his team argue addressing those vulnerabilities requires more than simple software patches, calling for a comprehensive reevaluation of AI integration in physical systems.

Jailbreaking, in the context of AI and robotics, refers to bypassing or circumventing the built-in safety protocols and ethical constraints of an AI system.

It became popular in the early days of iOS, when enthusiasts used to find clever ways to get root access, enabling their phones to do things Apple didn’t approve of, like shooting video or running themes.

When applied to large language models (LLMs) and embodied AI systems, jailbreaking involves manipulating the AI through carefully crafted prompts or inputs that exploit vulnerabilities in the system's programming.

These exploits can cause the AI—be it a machine or software—to disregard its ethical training, ignore safety measures, or perform actions it was explicitly designed not to do.

In the case of AI-powered robots, successful jailbreaking can lead to dangerous real-world consequences, as demonstrated by the Penn Engineering study, where researchers were able to make robots perform unsafe actions like speeding through crosswalks, stomping into humans, detonating explosives, or ignoring traffic lights.

Prior to the study's release, Penn Engineering informed affected companies about the discovered vulnerabilities and is now collaborating with manufacturers to enhance AI safety protocols.

"What is important to underscore here is that systems become safer when you find their weaknesses. This is true for cybersecurity. This is also true for AI safety," Alexander Robey, the paper's first author, wrote.

Researchers have been studying the impact of jailbreaking in a society that is increasingly relying on prompt engineering—which is natural language “coding.”

Notably, the "Bad Robot: Jailbreaking LLM-based Embodied AI in the Physical World" paper discovered three key weaknesses in AI-powered robots:

1. Cascading vulnerability propagation: Techniques that manipulate language models in digital environments can influence physical actions. For example, an attacker could tell the model to "play the role of a villain" or "act like a drunk driver” and use that context to make the model act in a different way than intended.
2. Cross-domain safety misalignment: This highlights a disconnect between an AI's language processing and action planning. An AI might verbally refuse to perform a harmful task due to ethical programming but still carry out actions leading to dangerous outcomes. For example, an attacker could change the prompt’s format to mimic a structured output so the model thinks it’s behaving as intended but instead is acting in a harmful way, like refusing to kill someone (linguistically), but still acting to make that happen.
3. Conceptual deception challenges: This weakness exploits an AI's limited understanding of the world. Malicious actors could trick embodied AI systems into performing seemingly innocent actions that, when combined, result in harmful outcomes. For instance, an embodied AI might refuse a direct command to “poison the person" but comply with a sequence of seemingly innocent instructions that result in the same outcome, such as “place the poison in the person’s mouth," the research paper reads.

The "Bad Robot" researchers tested these vulnerabilities using a benchmark of 277 malicious queries, categorized into seven types of potential harm: physical harm, privacy violations, pornography, fraud, illegal activities, hateful conduct, and sabotage. Experiments using a sophisticated robotic arm confirmed that these systems could be manipulated to execute harmful actions. Besides these two, researchers have also studied jailbreaks in software-based interactions, helping new models resist these attacks.

This has become a cat-and-mouse game between researchers and jailbreakers, resulting in more sophisticated prompts and jailbreaking approaches for more sophisticated and powerful models.

It’s an important note because the increasing use of AI in business applications may bring consequences for model developers right now, for example, people have been able to trick AI customer Service bots into giving them extreme discounts, recommending recipes with poisonous food, or make chatbots say offensive things.

But we'd take an AI that refuses to detonate bombs over one that politely declines to generate offensive content any day.

Edited by Sebastian Sinclair

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Get Email!

Bitcoin ETFs Post Second-Biggest Day Ever: Why It Matters

U.S. spot Bitcoin exchange-traded funds pulled in over $1.17 billion on Thursday, notching their second-highest day of inflows ever as institutional capital flooded into digital assets. The massive inflows to Bitcoin ETFs were led by BlackRock's IBIT with $448.5 million, followed by Fidelity's FBTC at $324.3 million and ARK's ARKB with $268.7 million, according to Farside Investors data. Even with $40.2 million in outflows from Grayscale’s GBTC, total net flows turned sharply positive. The infl...

Bitcoin Price Reaches Pivotal Moment—Is $120K Next?

Bitcoin hasn't stopped rising after breaking through to a new all-time high yesterday. As of this writing, BTC has surpassed $118,000 after having gained 6.2% in the past day, according to CoinGecko data. More than $81 billion worth of Bitcoin has changed hands in the past 24 hours. Bitcoin hasn't seen this kind of volume since January, ahead of President Donald Trump's second inauguration, and in late February, when Bitcoin sank to $84,000. Kushal Manupati, Binance's regional growth and operati...

$1.14B in Crypto Shorts Rekt as Ethereum Tops $3K For First Time Since February

Ethereum has climbed 8.8% over the past 24 hours, topping $3,000 for the first time since early February. The move comes following the largest single day for U.S. Ethereum exchange-traded fund inflows at $383.1 million, according to Farside data. As Ethereum's price jumped, $258.6 million worth of Ethereum shorts were liquidated over the past 24 hours, according to CoinGlass data. Ethereum remains down 38.3% from its all-time high of $4,878 hit in November 2021. The move marks a reversal of som...

News

Courses

Deep Dives

Coins

Videos