Even the Worst Version of Claude AI Is Better Than GPT 3.5, Researchers Say

Anthropic’s models are available for free—and every single one beats OpenAI’s free model according to a new global ranking.

Oct 5, 2023

4 min read

The AI industry is witnessing a riveting competition between the notable ChatGPT and Claude AI models. The Large Model Systems Organization (LMSO), responsible for creating the Chatbot Arena and the renowned Vicuna Model, has just updated their Chatbot Arena Leaderboard, reflecting how each AI chatbot measures up to its competitors. Turns out Anthropic is giving OpenAI a run for its money, even while its models are still free to use.

GPT-4, the powerhouse behind ChatGPT Plus and Bing AI, reigns supreme with the highest score, setting the gold standard for Large Language Models (LLMs). But as we move down the leaderboard, an unexpected underdog story unfolds. Anthropic's Claude models — Claude 1, Claude 2, and Claude Instant — all outperform GPT-3.5, the engine that powers the free version of ChatGPT. This implies that every Large Language Model developed by Anthropic can outclass the free version of ChatGPT.

The meticulous ranking system by the LMSO provided insight into the performance metrics of these models. According to the leaderboard, GPT-4 holds an Arena Elo Rating of 1181, significantly leading the chart, while the Claude models follow closely with ratings ranging from 1119 to 1155. GPT-3.5, on the other hand, lags with a rating of 1115.

To rank the models, the LMSO makes them “battle” in matches with similar prompts. The model with the best answer wins and the other loses. Users decide who wins based on their own preferences, but they never get to know which models are competing.

Comparison between different LLMs to rank them as the best AI. — Image: LMSO

As Decrypt previously reported, the difference in token processing capabilities between ChatGPT Plus and Claude Pro, although not a factor in the LMSO ranking, is also a major advantage that Claude models have over GPT.

"Claude Pro, based on the Claude 2 LLM, can process up to 100K tokens of information, while ChatGPT Plus, powered by the GPT-4 LLM, handles 8,192 tokens," we recalled. This differential in token processing ability underscores the edge Claude models hold in managing extensive contextual inputs, which is crucial for a nuanced and enriched user experience.

Moreover, when handling long prompts, Claude 2 has shown superiority over GPT, handling prompts of larger magnitude more efficiently. However, when prompts are comparable, Claude 1 and Claude Instant provide similar or slightly better results to GPT-3.5, showcasing the competitive nature of these models. With Claude’s context capabilities, a poor initial answer can be dramatically improved with a more refined, larger and richer prompt.

Open-source models are not far behind in this race.

WizardLM, a model trained on Meta’s LlaMA-2 with 70 billion parameters, stands out as the best open-source LLM. Following close are Vicuna 33B and the original LlaMA-2, released by Meta.

🎉The @lmsysorg just updated the Chatbot Arena Leaderboard!

Our WizardLM-70B is now the🥇Top-1 open-source model on both ⚔️Arena Elo and 📈MT-bench.

❤️Main Contributors:@CanXu20 @victorsungo_ai @ChiYeung_Law @hpluo12 @tangmensan

Leaderboard: https://t.co/1gkZKGVutQ
Model… pic.twitter.com/bsJ0jv2i7I

— WizardLM (@WizardLM_AI) October 5, 2023

Open-source models play an important role in the development of the AI space for different reasons. They can be run locally, which gives users the opportunity to finetune them and engages the community in a collective effort to perfect the model. They are also cheaper to run due to their licenses, which is why the space has dozens of open-source LLMs and only a handful of proprietary models.

But the game of AI chatbots isn't solely about numbers. It's about real-world implications.

As chatbots become integral in various sectors from customer service to personal assistants, their efficacy, adaptability, and accuracy become paramount. With Claude models ranking higher than GPT-3.5, businesses and individual users might find themselves at a crossroads, evaluating which model aligns best with their needs. Decrypt has prepared two guides to help you decide what model suits you best.

For the uninitiated, this might seem like just another leaderboard update. But for those closely watching the AI industry, it's a testament to how fierce the competition is and how swiftly the tides can turn. And as for the rest of us who sit in between those two camps, it's a reminder that in the AI world, today’s most popular model could fall to the most efficient.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Recommended News

AI Ghostwriting Is Creeping Into Science—Is That a Bad Thing?
Which words give AI away? A new study of more than 15 million biomedical abstracts on PubMed found that at least 13.5% of scientific papers published in 2024 show signs of AI-assisted writing tools, most notably OpenAI’s ChatGPT. The study by researchers from Northwestern University and the Hertie Institute for AI in Brain Health at the University of Tübingen found a sharp rise in 2024 in word patterns associated with AI-generated writing. These included both uncommon terms—such as “delves,” “un...
NewsArtificial Intelligence
4 min read
Jason NelsonJul 9, 2025
Create an account to save your articles.
Bye-Bye 'MechaHitler': Elon Musk's xAI Quietly Fixed Grok by Deleting a Line of Code
Elon Musk’s xAI appears to have gotten rid of the Nazi-loving incarnation of Grok that emerged Tuesday with a surprisingly simple fix: It deleted one line of code that permitted the bot to make“politically incorrect” claims. The problematic line disappeared from Grok's GitHub repository on Tuesday afternoon, according to commit records. Posts containing Grok's antisemitic remarks were also scrubbed from the platform, though many remained visible as of Tuesday evening. But the internet never for...
NewsArtificial Intelligence
5 min read
Jose Antonio LanzJul 9, 2025
Create an account to save your articles.
Linda Yaccarino Leaves Elon Musk's X Following Grok 'MechaHitler' Debacle
X CEO Linda Yaccarino is stepping down from her post, one day after the platform’s artificial intelligence chatbot Grok took on an antisemitic persona and started calling itself “MechaHitler.” Yaccarino served two years in the role after being hired by owner Elon Musk. “When Elon Musk and I first spoke of his vision for X, I knew it would be the opportunity of a lifetime to carry out the extraordinary mission of this company,” Yaccarino posted on X. “I’m immensely grateful to him for entrusting...
NewsArtificial Intelligence
3 min read
Logan HitchcockJul 9, 2025
Create an account to save your articles.

Coin Prices