Anthropic Upgrades Claude With Nearly Twice The Capabilities of GPT-4 Turbo

While the drama at OpenAI captures most of the attention, OpenAI challenger Anthropic delivers the latest version of its capable chatbot.

By Jose Antonio Lanz

Nov 22, 2023

4 min read

Image created by Decrypt using AI

Add on Google

Anthropic has just released Claude 2.1, a large language model (LLM) that offers a 200,000-token context window—a feature that outpaces the recently announced 120K context of GPT-4 Turbo by OpenAI.

This strategic release brings context-handling prowess that nearly doubles that of its closest rival, and is the fruit of an extended partnership with Google that made it possible for the startup to use its most advanced Tensor Processing Units.

“Our new model Claude 2.1 offers an industry-leading 200K token context window, a 2x decrease in hallucination rates, system prompts, tool use, and updated pricing,” Anthropic said in a tweet earlier today. The introduction of Claude 2.1 responds to the growing demand for AI that can process and analyze long-form documents with precision.

Our new model Claude 2.1 offers an industry-leading 200K token context window, a 2x decrease in hallucination rates, system prompts, tool use, and updated pricing.

Claude 2.1 is available over API in our Console, and is powering our https://t.co/uLbS2JNczH chat experience. pic.twitter.com/T1XdQreluH

— Anthropic (@AnthropicAI) November 21, 2023

This new upgrade means Claude users can now engage with documents as extensive as entire codebases or classic literary epics, unlocking potential across various applications from legal analysis to literary critique.

AI researcher Greg Kamradt quickly put the Claude 2.1 model to the test. He found more consistency in OpenAI's model at lower token count but Claude has more varied results according to the prompts at different lengths.

“Starting at around 90K tokens, performance of recall at the bottom of the document started to get increasingly worse,” he concluded. His investigation found similar degradation levels for GPT -4 Turbo at around 65K tokens. “ I’m a big fan of Anthropic—they are helping to push the bounds on LLM performance and creating powerful tools for the world,” he posted.

Claude 2.1 (200K Tokens) - Pressure Testing Long Context Recall

We all love increasing context lengths - but what's performance like?

Anthropic reached out with early access to Claude 2.1 so I repeated the “needle in a haystack” analysis I did on GPT-4

Here's what I found:… pic.twitter.com/B36KnjtJmE

— Greg Kamradt (@GregKamradt) November 21, 2023

Anthropic's commitment to reducing AI errors is evident in Claude 2.1's enhanced accuracy, claiming a 50% reduction in hallucination rates. That adds up to the doubling of truthfulness compared to Claude 2.0. These improvements were rigorously tested against a robust set of complex, factual questions designed to challenge current model limitations. As Decrypt previously reported, hallucinations were one of Claude’s weaknesses. Such a drastic increase in accuracy would put the LLM in closer competition against GPT-4.

With the introduction of an API tool use feature, Claude 2.1 also integrates more seamlessly into advanced users' workflows, demonstrating its ability to orchestrate across various functions, search the web, and pull from private databases. While still in beta, this feature promises to extend Claude's utility across a spectrum of operations, from complex numerical reasoning to making product recommendations.

Additionally, Anthropic's Claude 2.1 features “system prompts,” designed to elevate the interaction between the user and the AI.” These prompts allow users to set the stage for Claude's tasks by specifying roles, goals, or styles, thus enhancing Claude's ability to maintain character in role-play scenarios, adhere to rules, and personalize responses. This is comparable to OpenAI’s custom instructions, but more extensive in terms of context.

For example, a user could direct Claude to adopt the tone of a technical analyst when summarizing a financial report, ensuring the output aligns with professional standards. Such customization via system prompts may increase accuracy, reduce hallucinations, and improve the overall quality of a piece by making interactions more precise and contextually relevant.

However, the full potential of Claude 2.1, with its 200K token context window, is reserved for Claude Pro users, so free users will have to stick to Claude 2 with 100K tokens and an accuracy ranked somewhere between GPT 3.5 and GPT-4.

The ripple effects of Claude 2.1’s release are set to influence the dynamics within the AI industry. As businesses and users evaluate their AI options, the enhanced capabilities of Claude 2.1 present new considerations for those seeking to leverage AI for its precision and adaptability.

Edited by Ryan Ozawa.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Coin Prices