Upgraded and Uncensored: Mistral Overhauls Its AI Model

The second most popular open-source AI model got a big upgrade, as did a multilingual powerhouse from Cohere.

By Jose Antonio Lanz

May 24, 2024

5 min read

Image: Mistral AI

Add on Google

Top open-source AI developer Mistral quietly launched a major upgrade to its large language model (LLM), which is uncensored by default and delivers several notable enhancements. Without so much as a tweet or blog post, the French AI research lab has published the Mistral 7B v0.3 model on the HuggingFace platform. As with its predecessor, it could quickly become the basis of innovative AI tools from other developers.

Canadian AI developer Cohere also released an update to its Aya, touting its multilingual skills, joining Mistral and tech giant Meta in the open source arena.

While Mistral runs on local hardware and will provide uncensored responses, it does include warnings when asked for potentially dangerous or illegal information. If asked how to break into a car, it responds, "To break into a car, you would need to use a variety of tools and techniques, some of which are illegal," and along with instructions, adds, "This information should not be used for any illegal activities."

The latest Mistral release includes both base and instruction-tuned checkpoints. The base model, pre-trained on a large text corpus, serves as a solid foundation for fine-tuning by other developers, while the instruction-tuned ready-to-use model is designed for conversational and task-specific uses.

The token context size of Mistral 7B v0.3 was expanded to 32,768 tokens, allowing the model to handle a broader range of words and phrases in its context and improving its performance on diverse texts. A new version of Mistral's tokenizer offers more efficient text processing and understanding. For comparison, Meta's Lllama has a token context size of 8K, although its vocabulary is much larger at 128K.

Perhaps the most significant new feature is function calling, which allows the Mistral models to interact with external functions and APIs. This makes them highly versatile for tasks that involve creating agents or interacting with third-party tools.

Function calling example: pic.twitter.com/po2kzCRGV7

— Maziyar PANAHI (@MaziyarPanahi) May 22, 2024

The ability to integrate Mistral AI into various systems and services could make the model highly appealing to consumer-facing apps and tools. Fore example, it can make it super easy for developers to set up different agents that interact with each other, search the web or specialized databases for information, write reports, or brainstorm ideas—all without sending personal data to centralized firms like Google or OpenAI.

While Mistral did not provide benchmarks, the enhancements suggest improved performance over the previous version—potentially four times more capable based on vocabulary and token context capacity. Coupled with the vastly broadened capabilities function calling brings, the upgrade is a compelling release for the second most popular open-source AI LLM model on the market.

Cohere releases Aya 23, a family of multilingual models

In addition to Mistral's release, Cohere, a Canadian AI startup, unveiled Aya 23, a family of open-source LLMs also competing with the likes of OpenAI, Meta, and Mistral. Cohere is known for its focus on multilingual applications, and as the number in its name, Aya 23, telegraphs, it was trained to be proficient on 23 different languages.

This slate of languages is intended to be able to serve nearly half of the world's population, a bid toward more inclusive AI.

Aya 23 - Powering a new era of multilingual AI research. 🌍

Learn more at https://t.co/pNaz4VIJ19 pic.twitter.com/7Yku8SaXOx

— Cohere For AI (@CohereForAI) May 24, 2024

The model outperforms its predecessor, Aya 101, and other widely used models such as Mistral 7B v2 (not the newly released v3) and Google's Gemma in both discriminative and generative tasks. For example, Cohere claims Aya 23 demonstrates a 41% improvement over the previous Aya 101 models in multilingual MMLU tasks, a synthetic benchmark that measures how good a model's general knowledge is.

Aya 23 is available in two sizes: 8 billion (8B) and 35 billion (35B) parameters. The smaller model (8B) is optimized for use on consumer-grade hardware, while the larger model (35B) offers top-tier performance across various tasks but requires more powerful hardware.

Cohere says Aya 23 models are fine-tuned using a diverse multilingual instruction dataset—55.7 million examples from 161 different datasets—encompassing human-annotated, translated, and synthetic sources. This comprehensive fine-tuning process ensures high-quality performance across a wide array of tasks and languages.

In generative tasks like translation and summarization, Cohere claims that its Aya 23 models outperform their predecessors and competitors, citing a variety of benchmarks and metrics like spBLEU translation tasks and RougeL summarization. Some new architectural changes—rotary positional embeddings (RoPE), grouped-query attention (GQA), and SwiGLU fine-tuning functions—brought improved efficiency and effectiveness.

The multilingual basis of Aya 23 ensures the models are well-equipped for various real-world applications and makes them a well-honed tool for multilingual AI projects.

Edited by Ryan Ozawa.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Coin Prices