Google Pushes Gemini AI Upgrades Into Everything, Steamrolling OpenAI's ChatGPT

Google's Gemini 1.5 Pro boasts a context window of 1 million multimodal tokens, dwarfing GPT-4's limit, as it amps up AI features across all of its services.

6 min read

May 14, 2024

The day after OpenAI's much-hyped announcement of GPT-4o, its improved “omnimodal” large language model, Google fired back with a barrage of upgrades to its Gemini AI offerings, flexing its technological prowess, leaning into its live search advantages, and solidifying its standing against mindshare leader ChatGPT.

Building upon its strengths, Google is infusing generative AI into its search experience, enabling users to interact naturally with its search engine rather than relying on keyword-based queries. The keynote included a demonstration of a Google search query about removing a coffee stain. Instead of merely displaying links to webpages with instructions, the search engine immediately provided a comprehensive answer generated by AI.

These AI-generated results, designed to address user queries directly and efficiently, will be displayed above search results.

Throughout the presentation, Google made clear that its dominance in web search translated into a key advantage for its AI initiatives, showing off how various features could tap into current information rather than relying on a dated snapshot like other large-language models (LLM).

One of the standout features announced is ”Ask Photos,” which allows users to have natural conversations with Gemini to search for information in their gallery. While Google Photos has long allowed people to search their image library for specific people, objects, or words, the AI-infused update supports open-ended, natural-language queries.

For example, a Google user asked Gemini what his car’s license plate number was. Gemini scoured through all of his photos, evaluated them, and provided the correct answer.

Another upgrade would be familiar to users of a litany of AI meeting assistants, including those built into online conference platforms like Zoom. In Google Meet, Gemini can now analyze meetings, summarize them, and generate responses to questions in the chat. After a meeting, Gemini provides a list of action items and task assignments.

The biggest news involved upgrades under the hood. Google today announced the release of Gemini 1.5 Pro, boasting a staggering context window of 1 million multimodal tokens. That capacity dwarfs GPT-4's 128,000-token limit and is already available for both developers and consumers in Gemini Advanced—the tech giant’s paid AI services tier.

Image: Google

Google says it plans to expand its token handling capacity even further later this year, potentially reaching up to 2 million tokens for developers and a tenfold increase over that of GPT-4o.

Thanks to its massively increased capacity, Google also showed off Gemini’s impressive retrieval capabilities. This is a key feature, because until now, powerful LLMs like Claude or GPT-4 show a performance degradation—“forgetting” information previously discussed—when prompted with huge amounts of data.

Besides its top of the line models, Google launched Gemini 1.5 Flash, a compact multimodal LLM designed to compete against Claude 3 Haiku and GPT-3.5 in providing quick answers. However, its 1 million token handling capacity positions it as the most powerful "light" model available to date.

Probably the most interesting announcement was Google’s Project Astra, a universal AI agent that can be personalized and tailored to each user's needs. Google pointed out that the Astra presentation was recorded in real-time, likely in response to OpenAI’s live GPT-4o demo yesterday. The interaction seemed more capable and less clunky than GPT-4o, albeit with more concrete and less human-like responses.

While Gemini's voice is also broadly natural, it lacks the emotional—or even “flirty”—quality of OpenAI's new ChatGPT voice. Google’s priority appears to be functionality, versus OpenAI’s emphasis on more human-like interactions.

Going beyond traditional language models, Google introduced cross-platform customizable AI agents that it says are capable of reasoning, planning, and memorizing. These abilities allow Gemini to behave like a group of specialized AIs working together.

These API-based connections, which Google described as "Gems," seem to be a response to OpenAI’s customizable GPTs. Gems integrate seamlessly with Google's ecosystem, offering features like real-time language translation, contextual search, and personalized recommendations. Users can shape Gems to focus on specific tasks or topic areas, or use a specific tone.

Image: Google

Google also announced new generative AI models for images, videos, and music. Imagen 3, Google's new image generator, provides highly realistic and detailed images, contrasting with OpenAI's cartoonish look. They also claim it excels at generating text, a feature OpenAI also claims to have improved.

They also launched an upgraded version of MusicLM for generative music enthusiasts.

The icing on the cake was Veo, a Generative Video model, announced ahead of the release of OpenAI’s much touted but yet unreleased Sora video tool. The unedited raw output suggests a quality level comparable to the forthcoming OpenAI entry. Google says it will make Veo available in a few weeks—a timeline that could beat Sora to market.

Toward the end of its two-hour-plus keynote, Google also showed some love to the open-source community, unveiling Pali Gemma, an open-source vision model. The company also promised to launch Gemma 2—the next iteration of its open-source large language model—in June. The new model will have an extended token context window and will be more powerful and accurate.

Finally, Google announced that it was first releasing its suite of Gemini-powered features on its Android mobile operating system. It follows OpenAI’s apparent favoritism for Apple’s MacOS and iOS platforms, where it was releasing its latest updates before doing so on Windows, created by top investor Microsoft.

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Get Email!

Decrypt’s 2025 Project of the Year: Hyperliquid

Jeff Yan had just watched Sam Bankman-Fried’s crypto empire collapse in a matter of days in 2022. And it was then, amid the chaos of the collapsing FTX and the contagion it caused, that Yan decided to go all in on building his own cryptocurrency exchange—a decentralized alternative he called Hyperliquid. Now, three years later, Yan appears to have been vindicated. After co-founding Hyperliquid Labs in 2023, Yan has helped develop one of the most impactful decentralized exchanges and layer-1 netw...

'Bitcoin Senator' Cynthia Lummis Will Not Run for Reelection

Sen. Cynthia Lummis (R-WY), one of the crypto industry’s most reliable and powerful allies on Capitol Hill, announced Friday that she will not seek reelection when her term expires next year. “Deciding not to run for reelection does represent a change of heart for me, but in the difficult, exhausting session weeks this fall I’ve come to accept that I do not have six more years in me,” Lummis said in a statement. “I am a devout legislator, but I feel like a sprinter in a marathon. The energy requ...

Brooklyn Man Charged in Phishing Scheme That Swiped $16 Million From Coinbase Users

Prosecutors in Brooklyn said on Friday that a 23-year-old resident, Ronald Spektor, was charged with stealing $16 million in cryptocurrency from approximately 100 Coinbase users. The individual, who went by “lolimfeelingevil” online, was allegedly behind a phishing and social engineering scheme. Pretending to be a Coinbase representative, Spektor would convince users to send cryptocurrency to accounts he controlled, prosecutors said. Prosecutors said that Spektor motivated victims by claiming th...

News

Courses

Deep Dives

Coins

Videos