The day after OpenAI's much-hyped announcement of GPT-4o, its improved “omnimodal” large language model, Google fired back with a barrage of upgrades to its Gemini AI offerings, flexing its technological prowess, leaning into its live search advantages, and solidifying its standing against mindshare leader ChatGPT.
Building upon its strengths, Google is infusing generative AI into its search experience, enabling users to interact naturally with its search engine rather than relying on keyword-based queries. The keynote included a demonstration of a Google search query about removing a coffee stain. Instead of merely displaying links to webpages with instructions, the search engine immediately provided a comprehensive answer generated by AI.
These AI-generated results, designed to address user queries directly and efficiently, will be displayed above search results.
Throughout the presentation, Google made clear that its dominance in web search translated into a key advantage for its AI initiatives, showing off how various features could tap into current information rather than relying on a dated snapshot like other large-language models (LLM).
Google Adds AI Features to Search and Browsing May Never Be the Same
Google has announced new AI features coming to Google Search, including the ability to generate images and writing inspiration using the company's latest generative AI capabilities. "Search has long been a place where you can find information to help with life's questions—whether they're big or small. Advances in AI have driven us to develop simpler and smarter ways to help users uncover valuable insights and make sense of information." Senior Director of Product Management for Search Hema Budar...
One of the standout features announced is ”Ask Photos,” which allows users to have natural conversations with Gemini to search for information in their gallery. While Google Photos has long allowed people to search their image library for specific people, objects, or words, the AI-infused update supports open-ended, natural-language queries.
For example, a Google user asked Gemini what his car’s license plate number was. Gemini scoured through all of his photos, evaluated them, and provided the correct answer.
Another upgrade would be familiar to users of a litany of AI meeting assistants, including those built into online conference platforms like Zoom. In Google Meet, Gemini can now analyze meetings, summarize them, and generate responses to questions in the chat. After a meeting, Gemini provides a list of action items and task assignments.
Google Is Testing an AI Model 700X More Powerful Than OpenAI's Flagship Chatbot
Google Labs has announced a major upgrade to its Gemini Pro AI tool—the midsize AI model that powers the free version of its chatbot—introducing the ability to process up to 1 million tokens in preview. It provides unprecedented "context size" that leaves the current leading tools and their 128K capacity in the dust. The upgrade to Gemini Pro v1.5 makes it theoretically 700% more powerful than OpenAI’s paid GPT-4 model, and sets a new benchmark for computational linguistics and machine learning...
The biggest news involved upgrades under the hood. Google today announced the release of Gemini 1.5 Pro, boasting a staggering context window of 1 million multimodal tokens. That capacity dwarfs GPT-4's 128,000-token limit and is already available for both developers and consumers in Gemini Advanced—the tech giant’s paid AI services tier.

Google says it plans to expand its token handling capacity even further later this year, potentially reaching up to 2 million tokens for developers and a tenfold increase over that of GPT-4o.
Thanks to its massively increased capacity, Google also showed off Gemini’s impressive retrieval capabilities. This is a key feature, because until now, powerful LLMs like Claude or GPT-4 show a performance degradation—“forgetting” information previously discussed—when prompted with huge amounts of data.
Besides its top of the line models, Google launched Gemini 1.5 Flash, a compact multimodal LLM designed to compete against Claude 3 Haiku and GPT-3.5 in providing quick answers. However, its 1 million token handling capacity positions it as the most powerful "light" model available to date.
From ‘This is AGI’ to ‘I'm the Same’: OpenAI GPT-4o Reveal Meets Mixed Reactions
Before OpenAI’s tight, half-hour unveiling of its new GPT-4o AI model, rumors ran rampant about what could be announced, prompting company CEO Sam Altman to manage expectations, saying that it’s “not a search engine” but that they’ll announce “some new stuff we think people will love.” After the presentation, AI enthusiasts and the tech press were both amazed and disappointed. The release of GPT-4o—not GPT-5, as some people expected—represented a solid but incremental upgrade to GPT-4.5 Turbo. T...
Probably the most interesting announcement was Google’s Project Astra, a universal AI agent that can be personalized and tailored to each user's needs. Google pointed out that the Astra presentation was recorded in real-time, likely in response to OpenAI’s live GPT-4o demo yesterday. The interaction seemed more capable and less clunky than GPT-4o, albeit with more concrete and less human-like responses.
Here's a demo of the new Project Astra assistant! Pretty cool to see it on smart glasses too. Some of these agent experiences will come to the Gemini app later this year. pic.twitter.com/hGk6bbIzUD
— Mishaal Rahman (@MishaalRahman) May 14, 2024
While Gemini's voice is also broadly natural, it lacks the emotional—or even “flirty”—quality of OpenAI's new ChatGPT voice. Google’s priority appears to be functionality, versus OpenAI’s emphasis on more human-like interactions.
Going beyond traditional language models, Google introduced cross-platform customizable AI agents that it says are capable of reasoning, planning, and memorizing. These abilities allow Gemini to behave like a group of specialized AIs working together.
From Offended Bots to Occult Gurus, The 7 Best Custom GPTs People Have Created—So Far
The recent release of GPTs by OpenAI has unleashed a creative frenzy among users, allowing anyone to shape the formidable GPT-4 large language model (LLM) to serve their individual whims. It's akin to giving people the power to play God with ChatGPT, molding it into whatever persona or purpose they desire. In just a single day since its release, enthusiasts have conjured up an array of fascinating and sometimes bizarre creations. From practical and utilitarian adaptations—like a GPT trained in M...
These API-based connections, which Google described as "Gems," seem to be a response to OpenAI’s customizable GPTs. Gems integrate seamlessly with Google's ecosystem, offering features like real-time language translation, contextual search, and personalized recommendations. Users can shape Gems to focus on specific tasks or topic areas, or use a specific tone.

Google also announced new generative AI models for images, videos, and music. Imagen 3, Google's new image generator, provides highly realistic and detailed images, contrasting with OpenAI's cartoonish look. They also claim it excels at generating text, a feature OpenAI also claims to have improved.
They also launched an upgraded version of MusicLM for generative music enthusiasts.
New Meta AI Music Tool Is Trained on 10,000 Hours of ‘Licensed Music’
Meta has unveiled MusicGen, a new music-generating tool that can turn text prompts into audio recordings using AI. MusicGen was trained using 20,000 hours of music, according to the tech giant, including 10,000 hours of “high-quality” licensed songs and 390,000 instrumental tracks. According to Meta, all the music MusicGen was trained on was “covered by legal agreements with the right holders.” But those sources are are not music industry giants like Universal Music Group or artists like Taylor...
The icing on the cake was Veo, a Generative Video model, announced ahead of the release of OpenAI’s much touted but yet unreleased Sora video tool. The unedited raw output suggests a quality level comparable to the forthcoming OpenAI entry. Google says it will make Veo available in a few weeks—a timeline that could beat Sora to market.
Introducing Veo: our most capable generative video model. 🎥
It can create high-quality, 1080p clips that can go beyond 60 seconds.
From photorealism to surrealism and animation, it can tackle a range of cinematic styles. 🧵 #GoogleIO pic.twitter.com/6zEuYRAHpH
— Google DeepMind (@GoogleDeepMind) May 14, 2024
Toward the end of its two-hour-plus keynote, Google also showed some love to the open-source community, unveiling Pali Gemma, an open-source vision model. The company also promised to launch Gemma 2—the next iteration of its open-source large language model—in June. The new model will have an extended token context window and will be more powerful and accurate.
Finally, Google announced that it was first releasing its suite of Gemini-powered features on its Android mobile operating system. It follows OpenAI’s apparent favoritism for Apple’s MacOS and iOS platforms, where it was releasing its latest updates before doing so on Windows, created by top investor Microsoft.