Google is putting the icing on the cake for a busy week in the generative AI space with the launch of Imagen 3, its brand-new text-to-image model. This release builds upon the success of Imagen 2, introduced in December 2023, which already rivaled industry heavyweights like Dall-E 3 and MidJourney v5.

Imagen 3, originally announced in May, boasts enhanced capabilities in understanding and executing complex prompts, generating images with improved details, and better prompt adherence compared to its predecessor. It is pretty versatile, producing good results that range from photorealism to art and 3D compositions.

"Imagen 3 is our highest quality text-to-image model, capable of generating images with even better detail, richer lighting, and fewer distracting artifacts than our previous models," Google said in its official announcement.

Imagen 3's prompt improvements allow users to describe desired images in natural language without complex prompt engineering. The model's training also incorporated richer image captions, enabling it to capture nuanced details like specific camera angles or compositions and long text prompts when needed.

The tech giant has placed particular emphasis on Imagen 3's enhanced text rendering capabilities. Although noticeably improved, our initial tests show that its capabilities are not quite on par with other models like Dall-E 3, Auraflow, or Flux.

Generations by Imagen 3 and Grok 2 using the same prompt

Google has also stressed its commitment to safety and responsibility in the development and deployment of Imagen 3. The company implemented what it described as “extensive filtering and data labeling” processes to minimize harmful content in the model's training datasets. Additionally, Google said it conducted thorough evaluations, including red team exercises, to identify and fix potential vulnerabilities.

It is also important to note that Imagen 3 integrates SynthID, Google's watermarking tool. SynthID embeds a digital signature directly into the pixels of generated images. This watermark is imperceptible to the human eye but detectable by specialized software, providing a means to identify AI-generated content.

Currently, Imagen 3 is available through Google's ImageFX platform and Vertex AI. Looking ahead, Google plans to introduce popular editing features from Imagen 2, such as inpainting (editing elements in the image) and outpainting (expanding it), to Imagen 3 in the coming months. The company has also announced intentions to expand Imagen 3's availability across its broader product ecosystem, including integration into the Gemini app, Google Workspace, and Google Ads.

This release is part of a broader Google strategy that aims to put Gemini and AI technology in basically all of its services and hardware. This week, the company introduced its new Pixel 9 lineup, which was designed with AI capabilities at its core. The new Pixel phones can handle certain generative AI tasks locally, including text-based tasks and small image generations.

The release of Imagen 3 comes amid a flurry of activity in the AI image generation space. Elon Musk's xAI recently unveiled Grok 2, featuring the Flux.1 image generator, which has gained attention for its ability to produce highly realistic, uncensored images alongside strong text generation capabilities.

Meanwhile, MidJourney, another key player in the field, announced an imminent v6.2 update to its model. The company also teased the development of MidJourney v7, slated for release in the coming months. Ideogram, another contender in the AI image generation arena, has also hinted at a forthcoming update to its model. Finally. the Open Model Initiative has chosen Flux.1 as the foundation for developing its state-of-the-art open-source image generation model.

Edited by Ryan Ozawa.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.