Anthropic has just released Claude 2.1, a large language model (LLM) that offers a 200,000-token context window—a feature that outpaces the recently announced 120K context of GPT-4 Turbo by OpenAI.
This strategic release brings context-handling prowess that nearly doubles that of its closest rival, and is the fruit of an extended partnership with Google that made it possible for the startup to use its most advanced Tensor Processing Units.
“Our new model Claude 2.1 offers an industry-leading 200K token context window, a 2x decrease in hallucination rates, system prompts, tool use, and updated pricing,” Anthropic said in a tweet earlier today. The introduction of Claude 2.1 responds to the growing demand for AI that can process and analyze long-form documents with precision.
This new upgrade means Claude users can now engage with documents as extensive as entire codebases or classic literary epics, unlocking potential across various applications from legal analysis to literary critique.
The AI industry is witnessing a riveting competition between the notable ChatGPT and Claude AI models. The Large Model Systems Organization (LMSO), responsible for creating the Chatbot Arena and the renowned Vicuna Model, has just updated their Chatbot Arena Leaderboard, reflecting how each AI chatbot measures up to its competitors. Turns out Anthropic is giving OpenAI a run for its money, even while its models are still free to use.
GPT-4, the powerhouse behind ChatGPT Plus and Bing AI, reigns...
AI researcher Greg Kamradt quickly put the Claude 2.1 model to the test. He found more consistency in OpenAI's model at lower token count but Claude has more varied results according to the prompts at different lengths.
“Starting at around 90K tokens, performance of recall at the bottom of the document started to get increasingly worse,” he concluded. His investigation found similar degradation levels for GPT -4 Turbo at around 65K tokens. “ I’m a big fan of Anthropic—they are helping to push the bounds on LLM performance and creating powerful tools for the world,” he posted.
Anthropic's commitment to reducing AI errors is evident in Claude 2.1's enhanced accuracy, claiming a 50% reduction in hallucination rates. That adds up to the doubling of truthfulness compared to Claude 2.0. These improvements were rigorously tested against a robust set of complex, factual questions designed to challenge current model limitations. As Decrypt previously reported, hallucinations were one of Claude’s weaknesses. Such a drastic increase in accuracy would put the LLM in closer competition against GPT-4.
With the introduction of an API tool use feature, Claude 2.1 also integrates more seamlessly into advanced users' workflows, demonstrating its ability to orchestrate across various functions, search the web, and pull from private databases. While still in beta, this feature promises to extend Claude's utility across a spectrum of operations, from complex numerical reasoning to making product recommendations.
Global tech giant Google is massively upping its investment in Anthropic, creators of Claude AI. The $2 billion investment, first reported by the Wall Street Journal and confirmed by an Anthropic spokesperson to Decrypt, will be made in two payments of $500 million and another of $1.5 billion.
This $2 billion commitment from Google is a significant step up from the $400 million the global search giant put into Anthropic in February. Last month, e-commerce titan Amazon committed to investing $4 b...
Additionally, Anthropic's Claude 2.1 features “system prompts,” designed to elevate the interaction between the user and the AI.” These prompts allow users to set the stage for Claude's tasks by specifying roles, goals, or styles, thus enhancing Claude's ability to maintain character in role-play scenarios, adhere to rules, and personalize responses. This is comparable to OpenAI’s custom instructions, but more extensive in terms of context.
For example, a user could direct Claude to adopt the tone of a technical analyst when summarizing a financial report, ensuring the output aligns with professional standards. Such customization via system prompts may increase accuracy, reduce hallucinations, and improve the overall quality of a piece by making interactions more precise and contextually relevant.
However, the full potential of Claude 2.1, with its 200K token context window, is reserved for Claude Pro users, so free users will have to stick to Claude 2 with 100K tokens and an accuracy ranked somewhere between GPT 3.5 and GPT-4.
The ripple effects of Claude 2.1’s release are set to influence the dynamics within the AI industry. As businesses and users evaluate their AI options, the enhanced capabilities of Claude 2.1 present new considerations for those seeking to leverage AI for its precision and adaptability.
Edited by Ryan Ozawa.
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.