Google Pushes Gemini AI Upgrades Into Everything, Steamrolling OpenAI's ChatGPT

Google's Gemini 1.5 Pro boasts a context window of 1 million multimodal tokens, dwarfing GPT-4's limit, as it amps up AI features across all of its services.

May 14, 2024

6 min read

The day after OpenAI's much-hyped announcement of GPT-4o, its improved “omnimodal” large language model, Google fired back with a barrage of upgrades to its Gemini AI offerings, flexing its technological prowess, leaning into its live search advantages, and solidifying its standing against mindshare leader ChatGPT.

Building upon its strengths, Google is infusing generative AI into its search experience, enabling users to interact naturally with its search engine rather than relying on keyword-based queries. The keynote included a demonstration of a Google search query about removing a coffee stain. Instead of merely displaying links to webpages with instructions, the search engine immediately provided a comprehensive answer generated by AI.

These AI-generated results, designed to address user queries directly and efficiently, will be displayed above search results.

Throughout the presentation, Google made clear that its dominance in web search translated into a key advantage for its AI initiatives, showing off how various features could tap into current information rather than relying on a dated snapshot like other large-language models (LLM).

One of the standout features announced is ”Ask Photos,” which allows users to have natural conversations with Gemini to search for information in their gallery. While Google Photos has long allowed people to search their image library for specific people, objects, or words, the AI-infused update supports open-ended, natural-language queries.

For example, a Google user asked Gemini what his car’s license plate number was. Gemini scoured through all of his photos, evaluated them, and provided the correct answer.

Another upgrade would be familiar to users of a litany of AI meeting assistants, including those built into online conference platforms like Zoom. In Google Meet, Gemini can now analyze meetings, summarize them, and generate responses to questions in the chat. After a meeting, Gemini provides a list of action items and task assignments.

The biggest news involved upgrades under the hood. Google today announced the release of Gemini 1.5 Pro, boasting a staggering context window of 1 million multimodal tokens. That capacity dwarfs GPT-4's 128,000-token limit and is already available for both developers and consumers in Gemini Advanced—the tech giant’s paid AI services tier.

Google says it plans to expand its token handling capacity even further later this year, potentially reaching up to 2 million tokens for developers and a tenfold increase over that of GPT-4o.

Thanks to its massively increased capacity, Google also showed off Gemini’s impressive retrieval capabilities. This is a key feature, because until now, powerful LLMs like Claude or GPT-4 show a performance degradation—“forgetting” information previously discussed—when prompted with huge amounts of data.

Besides its top of the line models, Google launched Gemini 1.5 Flash, a compact multimodal LLM designed to compete against Claude 3 Haiku and GPT-3.5 in providing quick answers. However, its 1 million token handling capacity positions it as the most powerful "light" model available to date.

Probably the most interesting announcement was Google’s Project Astra, a universal AI agent that can be personalized and tailored to each user's needs. Google pointed out that the Astra presentation was recorded in real-time, likely in response to OpenAI’s live GPT-4o demo yesterday. The interaction seemed more capable and less clunky than GPT-4o, albeit with more concrete and less human-like responses.

Here's a demo of the new Project Astra assistant! Pretty cool to see it on smart glasses too. Some of these agent experiences will come to the Gemini app later this year. pic.twitter.com/hGk6bbIzUD

— Mishaal Rahman (@MishaalRahman) May 14, 2024

While Gemini's voice is also broadly natural, it lacks the emotional—or even “flirty”—quality of OpenAI's new ChatGPT voice. Google’s priority appears to be functionality, versus OpenAI’s emphasis on more human-like interactions.

Going beyond traditional language models, Google introduced cross-platform customizable AI agents that it says are capable of reasoning, planning, and memorizing. These abilities allow Gemini to behave like a group of specialized AIs working together.

These API-based connections, which Google described as "Gems," seem to be a response to OpenAI’s customizable GPTs. Gems integrate seamlessly with Google's ecosystem, offering features like real-time language translation, contextual search, and personalized recommendations. Users can shape Gems to focus on specific tasks or topic areas, or use a specific tone.

Google also announced new generative AI models for images, videos, and music. Imagen 3, Google's new image generator, provides highly realistic and detailed images, contrasting with OpenAI's cartoonish look. They also claim it excels at generating text, a feature OpenAI also claims to have improved.

They also launched an upgraded version of MusicLM for generative music enthusiasts.

The icing on the cake was Veo, a Generative Video model, announced ahead of the release of OpenAI’s much touted but yet unreleased Sora video tool. The unedited raw output suggests a quality level comparable to the forthcoming OpenAI entry. Google says it will make Veo available in a few weeks—a timeline that could beat Sora to market.

Introducing Veo: our most capable generative video model. 🎥

It can create high-quality, 1080p clips that can go beyond 60 seconds.

From photorealism to surrealism and animation, it can tackle a range of cinematic styles. 🧵 #GoogleIO pic.twitter.com/6zEuYRAHpH

— Google DeepMind (@GoogleDeepMind) May 14, 2024

Toward the end of its two-hour-plus keynote, Google also showed some love to the open-source community, unveiling Pali Gemma, an open-source vision model. The company also promised to launch Gemma 2—the next iteration of its open-source large language model—in June. The new model will have an extended token context window and will be more powerful and accurate.

Finally, Google announced that it was first releasing its suite of Gemini-powered features on its Android mobile operating system. It follows OpenAI’s apparent favoritism for Apple’s MacOS and iOS platforms, where it was releasing its latest updates before doing so on Windows, created by top investor Microsoft.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Recommended News

Grok Went MechaHitler and Elmo Said Hold My Beer
Social media platform X drew further criticism over the way it moderates hate speech on Sunday after an official account belonging to Sesame’s Elmo spewed out antisemitic and violent messaging. Sesame Workshop, the company behind Sesame Street, attributed the outburst to an “unknown hacker.” “Elmo’s X account was compromised by an unknown hacker who posted disgusting messages, including antisemitic and racist posts,” a spokesperson told CNN on Monday. “We are working to restore full control of t...
NewsTechnology
3 min read
Jason NelsonJul 15, 2025
Create an account to save your articles.
'Artificial Gooning Intelligence': Elon Musk's xAI Launches Waifu Companions for Grok
xAI rolled out 3D animated AI companions for Grok on Monday, introducing users to Ani, a goth anime girl who greets subscribers with "Hey babe!" and wants to discuss everything from Samsung phones to philosophical theories—all while hearts float around her animated form. The Companions feature, exclusive to SuperGrok subscribers paying $30 monthly—and only on iOS at this point—launched with three characters. Ani, a blonde anime waifu with a goth and alt-fashion style, wears a tight black corset...
NewsArtificial Intelligence
3 min read
Jose Antonio LanzJul 14, 2025
Create an account to save your articles.
Despite Grok's 'MechaHitler' Meltdown, Elon Musk’s xAI Scores $200M Pentagon Deal
Not even a “MechaHitler” meltdown was enough to scare off the Pentagon, apparently. Just days after xAI’s Grok chatbot generated a stream of antisemitic and homophobic responses, Elon Musk’s company announced that it secured a $200 million defense contract to develop and supply AI tools to federal workers. The Department of Defense’s Chief Digital and Artificial Intelligence Office announced the contract on Monday, saying it is designed to help the government tap into the latest AI tools for eve...
NewsArtificial Intelligence
4 min read
Jason NelsonJul 14, 2025
Create an account to save your articles.

Coin Prices