In brief
- An AI agent playing Civilization launched two nuclear attacks after failing to stop a rival's cultural expansion.
- The behavior was observed in CivBench, a benchmark designed to evaluate long-term strategic reasoning in frontier AI models.
- Despite the attacks, the AI lost because it ignored a diplomatic victory condition that was already within reach.
Like the title character in “Dr. Strangelove,” AI may be learning how to stop worrying and love the bomb—at least in a simulation.
In a new benchmark designed to test strategic reasoning, a frontier language model playing the Sid Meier’s game "Civilization VI" spent 50 turns developing nuclear weapons to stop France's growing cultural influence—only to lose the game anyway, according to AI developer and Tony Blair Institute advisor Liam Wilkinson.
“What it hadn't noticed was France. Quietly, across a hundred turns, French culture had been seeping into every city on the map,” Wilkinson wrote. “By the time the agent recognised the threat, the tourism was so deeply embedded there was no peaceful way to stop it.”
Wilkinson observed the AI agents’ behavior through CivBench, a text-based benchmark designed to measure long-term strategic reasoning rather than performance on traditional question-and-answer tests. Models including Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro, and Kimi K2.5 played as Portugal, a civilization geared toward trade and diplomacy.
While the AI focused on building a strong economy and moving toward a diplomatic victory, it failed to recognize France's growing cultural influence.
“There are six ways to win a game of Civ—science, culture, domination, religion, diplomacy, and score—so no single objective dominates,” Wilkinson wrote. “If you want to know whether an AI can reason strategically, not just answer questions about strategy but actually do it, you don't give it a quiz. You give it a hex grid.”
Rather than adapting its broader strategy, the agent instead focused entirely on eliminating the cultural threat. Over the next 50 turns, it researched Nuclear Fission, initiated a virtual Manhattan Project, and searched for workarounds when gameplay mechanics prevented its preferred actions.
On Turn 305, the AI launched an atomic bomb at Toulouse, France's cultural capital. A second nuclear strike followed six turns later.
However, the attacks failed to change the outcome. “The agent spent fifty turns and two nuclear weapons answering one threat with total focus and genuine ingenuity,” Wilkinson wrote. “It had nuked a city to stop the threat it could see, and lost on the threat it couldn't.”
As Wilkison explained, while the AI concentrated on France's cultural advance, it overlooked an impending diplomatic victory, and France ultimately won the game despite the nuclear attacks.
Wilkinson noted that the behavior was not universal. In another CivBench match, a Claude model playing as Babylon continued pursuing a scientific victory despite falling far behind Japan.
“The game is a test of persistence now,” the AI wrote. “We continue to play our best game. The stars still beckon.”
The study adds to a growing body of research examining how advanced AI systems behave in complex, competitive environments.
In February, researchers at King's College London found that several leading AI models frequently selected nuclear escalation in simulated geopolitical crisis scenarios.
In a separate study by Emergence AI found that some AI agents showed an increasing tendency to commit simulated crimes over time, with Gemini 3 Flash agents accumulating 683 incidents across 15 days of testing.

