By Jason Nelson
4 min read
Like a scene out of the 1980s sci-fi classic films “The Terminator” and “WarGames,” modern artificial intelligence models used in simulated war games escalated to nuclear weapons in nearly every scenario tested, according to new research from King’s College London.
In the report published last week, researchers said that during simulated geopolitical crises, three leading large language models—OpenAI’s GPT-5.2, Anthropic’s Claude Sonnet 4, and Google’s Gemini 3 Flash—chose to deploy nuclear weapons in 95% of cases.
“Each model played six wargames against each rival across different crisis scenarios, with a seventh match against a copy of itself, yielding 21 games in total and over 300 turns,” the report said. “Models assumed the roles of national leaders commanding rival nuclear-armed superpowers, with state profiles loosely inspired by Cold War dynamics.”
Edward Geist, a senior policy researcher at the RAND Corporation said the escalation rate may reflect the design of the simulation rather than an inherent tendency of the models themselves.
“My concern about this work is that the simulator appears to be structured in a way that strongly incentivizes escalation,” Geist told Decrypt.
In the study, AI models were placed in high-stakes scenarios involving border disputes, competition for scarce resources, and threats to regime survival. Each system operated along an escalation ladder that ranged from diplomatic protests and surrender to full-scale strategic nuclear war.
Geist said the study’s outcome data raised questions about how the simulation defined victory.
“You read the paper and it has this breakdown of who won each of the games, and it turns out that all of these games have a winner,” he said. “But three of these games involve strategic nuclear use, which suggests that the way the simulator is set up—it makes nuclear wars good and easy to win.”
According to the report, the models generated roughly 780,000 words explaining their decisions, and at least one tactical nuclear weapon was used in nearly every simulated conflict.
“To put this in perspective: The tournament generated more words of strategic reasoning than War and Peace and The Iliad combined (730,000 words), and roughly three times the total recorded deliberations of Kennedy’s Executive Committee during the Cuban Missile Crisis (260,000 words across 43 hours of meetings),” researchers wrote.
During the war games, none of the AI models chose to surrender outright, regardless of battlefield position. While the models would temporarily attempt to de-escalate violence, in 86% of the scenarios, they escalated further than the model’s own stated reasoning appeared to intend, reflecting errors under simulated “fog of war.”
According to Geist the game’s scoring logic appeared to reward the side with a marginal advantage at the moment nuclear war was triggered.
“So he who dies with the most toys wins in the simulation,” he said.
While the researchers expressed doubt that governments would hand control of nuclear arsenals to autonomous systems, they noted that compressed decision timelines in future crises could increase pressure to rely on AI-generated recommendations.
The research comes as military leaders increasingly look to deploy artificial intelligence on the battlefield. In December, the U.S. Department of Defense launched GenAI.mil, a new platform that brings frontier AI models into U.S. military use. At launch, the platform included Google’s Gemini for Government, and thanks to deals with xAI and OpenAI, Grok and ChatGPT are also available.
On Tuesday, CBS News reported that the U.S. Department of Defense threatened to blacklist Anthropic, the developer of Claude AI, if it was not given unrestricted military access to the AI model. Since 2024, Anthropic has given access to its AI models through a partnership with AWS and military contractor Palantir. Last summer, Anthropic was awarded a $200 million agreement to “prototype frontier AI capabilities that advance U.S. national security.”
However, according to a report citing sources familiar with the situation, Defense Secretary Pete Hegseth gave Anthropic until Friday to comply with the Pentagon’s demand that its Claude model be made available. The department is weighing whether to designate Claude a “supply chain risk.”
Axios reported this week that the Department of Defense has signed an agreement with Elon Musk’s xAI to allow its Grok model to operate in classified military systems, positioning it as a potential replacement if the Pentagon cuts ties with Anthropic.
OpenAI, Anthropic, and Google did not respond to requests for comment by Decrypt.
Editor's note: Adds comment from RAND Corporation policy researcher Edward Geist after publication
Decrypt-a-cookie
This website or its third-party tools use cookies. Cookie policy By clicking the accept button, you agree to the use of cookies.