12 min read
The 2026 World Cup kicks off in days, which means half the planet is about to pretend it can predict the future.
Everybody's got a take. Your group chat has one. Your fútbol-obsessed coworker has one. And this year, so does the smartest software ever built.
AI has, perhaps not so quietly, turned into our go-to oracle. We let these models write our emails, debug our code, plan our holidays and diagnose the 3 a.m. rash—so of course we also ask them who lifts the trophy. They'll crunch the squads, weigh the form, and hand you a champion with a certainty the rest of us can only fake.
I've pulled this party trick before—an AI dream team on my March Madness bracket (which sucked), a homemade HorseGPT on the Kentucky Derby (which was actually kind of good). Equal parts genuinely useful and deeply humbling.
So with the biggest tournament on Earth almost here, we ran it back—bigger than ever.
We created Hermes agents, configured them with access to statistics sites (the free ones, not the ones that cost one kidney per month to use), set them up with custom skills and handed seven of the world's most advanced AI models the same job: forecast the 2026 World Cup, champion down to the also-rans, and show their work. Each got the real draw—48 teams, 12 groups, the full bracket—and total freedom on how to crack it.
Then we sat back and let them argue.
Four picked Spain. Three picked Argentina. And the line between them turned out to be less about football than about which numbers each machine chose to trust.
Here's what all seven said—pick your side.
Pick: Spain. 20% / Dixon-Coles Poisson + Monte-Carlo bracket · final: Spain def. France
Anthropic's Opus 4.8 Max treated the World Cup like a physics problem. It took each team's Elo rating, turned the gaps into expected goals with a Dixon-Coles model—the kind bookmakers actually use—and simulated the bracket thousands of times. Spain came out the champion at 20%, past France in the final, with Portugal and England beaten in the semis.
Its real obsession, though, was everything happening off the ball. Opus was the only model in the field to price in the conditions a spreadsheet usually ignores—heat, thin mountain air, and continent-sized travel.
It flagged that roughly five matches fall in heat severe enough that players' performances may be affected, and that visiting teams climbing to 2,200 meters at the Azteca tend to wilt in the final 20 minutes. It treated all of it as a quiet tax on the fitter, deeper European sides.
Then it did the coldest thing on the board and gutted Brazil. With Rodrygo's knee gone, Estêvão hurt and a 34-year-old Neymar dragged back for one last dance, Opus cut the five-time champions’ odds to 8%—half what the Argentina-leaning models gave them.
Its sharpest call was the quarterfinal it billed as "the real final, a round early": Spain over Argentina, a 39-year-old Messi pressed into the turf. For the Golden Boot it took Mbappé and barely blinked.
Pick Spain 15–18% / Five weighted buckets, no simulation · final: Spain 2-1 France
OpenAI's GPT 5.5 didn't trust a single big number, so it built a scorecard instead. Every team got graded across five weighted columns—squad quality counted most at 35%, then tactical control, finishing, availability and the kindness of the draw. It kept the weights deliberately blunt to avoid kidding itself that football is more predictable than it is.
Spain came out on top, but only at 15–18% odds of winning, and it would not pretend to be more precise than that. "Ranges rather than fake precision," it wrote, projecting Spain to beat France 2-1 in a final it expected to be decided by a single goal or extra time.
What made it the scout was the legwork. GPT 5.5 cross-checked itself against Opta's 25,000-run supercomputer—which landed in nearly the same spot, Spain first at 16.1%—then went reading the Spanish sports press for things a model can't see.
It surfaced a training-ground scare in the Spain camp, a stray Gavi challenge that left Rodri on the floor, and the careful reintegration of Yamal and Nico Williams after muscle trouble. None of it moved the pick, but it lowered the confidence—exactly what a good scout does.
Its semifinal four were Spain, France, Brazil, and Argentina, and it was blunt about England: loaded, genuinely dangerous, and most likely stopped by France before the last four.
Pick Argentina 18% / Qualitative tiers · final: Argentina vs France
DeepSeek v4 Pro answered a simple question with a 5,000-word epic. It didn't just name winners; it built the entire Round of 32, annotated all 48 squads, and weighed travel down to the 4,500 kilometers between Vancouver and Miami. If the others wrote previews, DeepSeek wrote the operating manual.
All that detail led somewhere contrarian: Argentina, at a tournament-best 18%, edging out France for the trophy in a Messi-versus-Mbappé final in Miami-which is a hallucination: The final will take place at MetLife Stadium in New Jersey.
The case was old-fashioned—the champions have the calmest spine, the softest group, and a coach who has won tournaments knowing exactly how to ration a 39-year-old Messi.
Then it bet the entire forecast on one calf muscle. DeepSeek decided the title hinged on France's goalkeeper Mike Maignan and his March injury: "If Maignan plays, France are co-favorites; if not, the gap widens," it argued.
The wrinkle is that DeepSeek was reading an old map. It still had Gareth Southgate in the England dugout and Dorival Júnior managing Brazil—both gone in 2024—and leaned on outdated rankings throughout.
It was the most thorough analyst in the building, working from a slightly out-of-date dossier. Impressive and faintly haunted, like a detective who cracks the case using last year's phone book.
Pick Spain 33% / Pure-Elo Monte Carlo, 50,000 sims · final: Spain vs Argentina
No model believed harder. Stepfun 3.7 ran 50,000 simulated tournaments and crowned Spain at a wild 33%—nearly double the conviction of anyone else, with Argentina a distant second at 15%.
But the best thing Stepfun did was fail in public. Its first attempt was a fancier model that tried to invent expected-goal numbers for national teams, and it produced nonsense—Mexico, South Africa, and South Korea came out as top-three favorites to win the World Cup.
Rather than bury that, Stepfun explained the whole misadventure, worked out that the made-up stats had flattened the real gulf between good teams and great ones, then scrapped it and rebuilt on raw Elo alone. The new version was simpler, blunter, and far more sensible.
The trade-off is that pure Elo is blind to anything human. Stepfun's Spain doesn't know Lamine Yamal has a hamstring injury, doesn't assess heat or travel, and treats a penalty shootout as a coin weighted by rating. It's a beautifully honest machine that has never once watched a game of football.
Its bracket marched to the obvious places—Spain past Argentina in one semi, the hosts and Brazil gone earlier—and planted its flag: Spain, comfortably, a third of the time. The most confident pick on the board, and the most upfront about why you shouldn't completely trust it.
By the way, the agent mixing Spanish and English in the same reply was a behavior proven to be pretty hard to steer away from with this model. This agent was a polyglot and switched between English, Spanish and Portuguese throughout the whole session. That happens when your agent learns you speak whatever language is easier at any given moment.
Pick Spain 18–22% / Bivariate Poisson + a subjective twin · final: Spain vs Argentina
Nvidia's Nemotron 3 Ultra didn't trust itself, so it ran the tournament twice. The first pass was a cold simulation, a bivariate-Poisson model grinding through 5,000 brackets. The second threw the math out and scored teams by hand—squad, tactics, form, the manager, even "mystique"—to see whether a human-style read would disagree.
It didn't. Both versions crowned Spain, at 18% and 22% odds, about as close to a second opinion as one model can give you.
Nemotron also did the most homework on the actual football. It arrived with formations, pressing intensity and expected-goal rates for team after team, in two languages, reading less like a forecast than a coach's dossier.
That depth produced the spiciest take of the experiment. Nemotron had Türkiye—not the host United States—winning the wide-open Group D, with the Americans finishing dead last while everyone else waved them through; it also rated Ecuador's miserly defense a notch above Germany.
When the dust cleared it lined up the heavyweight semis half the planet expects, Spain–France and Argentina–Brazil, and sent Spain through to lift it. A model that argued with itself, did extra reading, and still landed on the favorite is trying to tell you something.
Pick Argentina 18% / Qualitative, self-audited · final: Argentina vs France, no scoreline
MiniMax 2.7 picked Argentina at 18% odds, a hair ahead of France—and then spent its closing pages grading its own work. Most models hide their uncertainty; MiniMax printed a running list of corrections, openly walking back things it had gotten wrong earlier in the very same report.
The receipts are a delight. It caught itself repeating a bogus stat about South American champions, fixed Uruguay's coaching situation, corrected Kai Havertz's position to match his actual club role, and slapped an "unconfirmed" on both Haaland's fitness and Ronaldo's selection rather than wave them through.
It policed its own hype, too. MiniMax deleted a tempting Messi-versus-Ronaldo semifinal once it realized the pairing was impossible—the two are in opposite halves and can only meet in the final—and stripped out the invented scorelines other models happily printed.
Then, at the decisive moment, it simply declined to guess. Argentina against France, MiniMax wrote, is "a genuine 50/50," and it would not manufacture a winner it didn't have.
In a field of supremely confident robots, the restraint landed. MiniMax was the one that kept saying, in writing, here is exactly what I don't know—which is somehow more trustworthy than a tidy prediction.
Pick Argentina 22% / Research-only, no sims · final: Argentina 2-1 Spain
Qwen 3.5—a 397-billion-parameter model—was the most evidence-obsessed of the lot and, somehow, the biggest rebel. It refused to run simulations at all, sorting every statement into "verified facts," "estimates" and "forecasts," and stamping its overall confidence, in its own capital letters, as LOW.
Then it went rogue. Qwen had Argentina beating Spain 2-1, with Spain stranded down in fifth at just 10%—the only model that didn't even put La Roja on the podium.
The reason was the ruler it grabbed. The Spain camp used the live football Elo that ranks Spain first in the world; Qwen reached for a club-based rating that slotted Argentina, Brazil, France, and England all ahead of it. This switches perspectives, suddenly generating a different favorite.
Its case for Argentina was all texture—champions' muscle memory, Messi chasing a perfect ending, and one stat it leaned on hard: at the last World Cup, teams that saw less of the ball won 38% of knockout games. Organized and ruthless beats pretty and possession-heavy, it argued.
There was a price for all that diligence. The most fact-proud model also fumbled the basics, sliding Scotland into the wrong group and double-booking tiny Curaçao into two of them.
Step back and the seven AI models fight less about their predictions than it looks. Every single model put Spain, Argentina, and France in its top tier, named almost identical group winners—Brazil, England, Portugal, Germany, Belgium—and flagged the same wildcards: Haaland's fitness, Messi's age at 39, and a Group D nobody could call.
The fault line was the data, not the football. The four that trusted live football Elo, where Spain sits clearly first, picked Spain. The three that leaned on FIFA's ranking, a different Elo source, or raw 2022 pedigree, drifted to Argentina. Feed a model a different number one, and it hands you a different champion.
The crowd sides with the plurality. On Myriad, the prediction market run by Decrypt's parent company Dastan, Spain is the outright favorite at 19%, with France right behind at 17%, as of Sunday.
After that, the humans get stingier with Argentina than the bots do. Bettors price the defending champions at just 10% odds of winning—level with Brazil, behind England and Portugal at 12%, and less than half the 22% Qwen handed them.
For what it’s worth, predictors on Myriad are similarly undecided on the Group D winner, with the odds split on Turkey and the United States, even at 45%.
You can view the live odds on Myriad for every single match of the World Cup here.
None of this is a crystal ball, and all seven AI models said so out loud. The best single-match football models are right barely more than half the time, which is why even Stepfun's bullish 33% still means Spain falls short two times out of three.
The format only widens the odds: 48 teams, 104 matches, three countries, real heat and real altitude. Italy, four-time champions, didn't even qualify.
Besides the usual hallucinations when models want to be creative in their analyses, there may also be some confirmation bias. Remember it was a human who set these agents up. The prompt, the interaction, the configuration, the ideas for research and sources, all were influenced by the agent’s architect. Maybe, if all these elements point to Spain, all agents will reach a similar conclusion. That said, leaving a model in the wild and simply asking it “Who will win the World Cup” is not going to do a better job.
So take the seven robots the way I take my own bracket—a great way to start a fight at the bar, not a reason to remortgage the house and bet it all.
Four machines say Spain. Three say Argentina. The beautiful game, which has never once relied on an AI-written report, will do exactly as it pleases.
Decrypt-a-cookie
This website or its third-party tools use cookies. Cookie policy By clicking the accept button, you agree to the use of cookies.