Although AI exploded onto the scene through sometimes eerily clever chatbots, text-based interactions are already old fashioned. The announcement of OpenAI's GPT-4 update introduced GPT-Vision (GPT-V), the latest multimodal AI marvel. The announcement is now become reality as users finally get a chance to test the full potential of its abilities.
A multimodal large language model (LLM) means that it can interact not only with the written word, but also through other modes. In this case, the new GPT-V can understand images and work with them. Also, thanks to the new generative art tool DALL-E 3, ChatGPT can both take images as input but also generate images as output.
These new capabilities have raised eyebrows across the tech space as users put them through their paces. Can they decode redacted government documents on UFO sightings? Yes. "ChatGPT-4V Multimodal decodes a redacted government document on a UFO sighting released by NASA," one tweet raves. "Maybe the truth isn't out there; it's right here in GPT-V."
Trying to fill gaps in a string of text is basically what LLMs do. The user did the next best thing when trying to test GPT-V’s capabilities and made it guess parts of a text that he censored. “Nearly 100% intent accuracy." he reported.
Of course, it's hard to verify whether its guess at what's otherwise obscured is accurate—it’s not like we can ask the CIA how well it did peering through the black lines.
Even harder than uncovering information that has been censored by the government is trying to understand your doctor's cryptic handwriting. But GPT-V can unscrable the scribble. With a polite prompt, GPT-V can make sense of even the most indecipherable doctor's notes, ensuring that "take two tablets" doesn't become "bake blue waffles."
But be careful. Sometimes even the most advanced AI fails against the hands of an experienced—or arthritic—doctor, and it may take an expert to decipher those written enigmas.
And for those who don’t trust their doctors, ChatGPT can provide an instant second opinion. The model can understand X-rays and provide analysis and insights into specific medical cases.
But why stop at handwriting and body scans? GPT-V has become the latest home fitness guru, curating workout plans tailored to your home equipment and goals. And if you're curious about how many calories are in that meal you’re about to eat, GPT-V's got your back. One user gleefully shared, "OK ChatGPT 4.0 with new vision features... recognizes everything. Even a seal on the beach."
Interior design enthusiasts, rejoice! The AI now offers design suggestions, and can incorporate personal preferences. Imagine a living space that screams "you," without the hefty designer fees. Just take a picture of your awful room and ask GPT-V for suggestions to turn it into the paradise you want it to be.
Homework woes? Just screenshot the assignment, and GPT-V takes the role of that helpful classmate you always wished sat next to you.
And for the finance geeks among us, GPT-V isn't just about fun and games. GPT-V can dive deep into technical analysis. Just input a screenshot of your favorite (or most hated) stock or crypto, and it will analyze your chart and make projections accordingly. Just remember that it's not financial advice—and if you end up poor, no AI will make you rich.
The dawn of multimodal LLMs is redefining industries. With AI titans evolving, GPT-V is only the tip of the iceberg. Google’s upcoming Gemini is rumored to outperform Bard with its multimodal prowess. NexT-GPT offers an open-source alternative, and the horizon promises models trained to juggle words, sounds, videos, and images.
Such advancements aren't just technobabble—they hold implications that could reshape our daily interactions, professions, and perhaps even our worldview. And while OpenAI pioneers with GPT-V, competitors aren't far behind. Could we be on the brink of an AI renaissance?
Well, if you're still using AI just for chat, you might already be falling behind. AI can read and see, and gets more capabilities every day.
GPT-V can also ruin the fun of a "Where's Waldo?" book. Why would someone want this? This is ChaosGPT territory.
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.