From Reading X-Rays to Decoding Classified UFO Reports, ChatGPT Shows Off Its Vision

Twitter is abuzz with examples of GPT-4's new visual abilities. Here are some of the best.

5 min read

Oct 11, 2023

Although AI exploded onto the scene through sometimes eerily clever chatbots, text-based interactions are already old fashioned. The announcement of OpenAI's GPT-4 update introduced GPT-Vision (GPT-V), the latest multimodal AI marvel. The announcement is now become reality as users finally get a chance to test the full potential of its abilities.

A multimodal large language model (LLM) means that it can interact not only with the written word, but also through other modes. In this case, the new GPT-V can understand images and work with them. Also, thanks to the new generative art tool DALL-E 3, ChatGPT can both take images as input but also generate images as output.

These new capabilities have raised eyebrows across the tech space as users put them through their paces. Can they decode redacted government documents on UFO sightings? Yes. "ChatGPT-4V Multimodal decodes a redacted government document on a UFO sighting released by NASA," one tweet raves. "Maybe the truth isn't out there; it's right here in GPT-V."

Trying to fill gaps in a string of text is basically what LLMs do. The user did the next best thing when trying to test GPT-V’s capabilities and made it guess parts of a text that he censored. “Nearly 100% intent accuracy." he reported.

Of course, it's hard to verify whether its guess at what's otherwise obscured is accurate—it’s not like we can ask the CIA how well it did peering through the black lines.

Even harder than uncovering information that has been censored by the government is trying to understand your doctor's cryptic handwriting. But GPT-V can unscrable the scribble. With a polite prompt, GPT-V can make sense of even the most indecipherable doctor's notes, ensuring that "take two tablets" doesn't become "bake blue waffles."

But be careful. Sometimes even the most advanced AI fails against the hands of an experienced—or arthritic—doctor, and it may take an expert to decipher those written enigmas.

And for those who don’t trust their doctors, ChatGPT can provide an instant second opinion. The model can understand X-rays and provide analysis and insights into specific medical cases.

But why stop at handwriting and body scans? GPT-V has become the latest home fitness guru, curating workout plans tailored to your home equipment and goals. And if you're curious about how many calories are in that meal you’re about to eat, GPT-V's got your back. One user gleefully shared, "OK ChatGPT 4.0 with new vision features... recognizes everything. Even a seal on the beach."

Interior design enthusiasts, rejoice! The AI now offers design suggestions, and can incorporate personal preferences. Imagine a living space that screams "you," without the hefty designer fees. Just take a picture of your awful room and ask GPT-V for suggestions to turn it into the paradise you want it to be.

Homework woes? Just screenshot the assignment, and GPT-V takes the role of that helpful classmate you always wished sat next to you.

And for the finance geeks among us, GPT-V isn't just about fun and games. GPT-V can dive deep into technical analysis. Just input a screenshot of your favorite (or most hated) stock or crypto, and it will analyze your chart and make projections accordingly. Just remember that it's not financial advice—and if you end up poor, no AI will make you rich.

The dawn of multimodal LLMs is redefining industries. With AI titans evolving, GPT-V is only the tip of the iceberg. Google’s upcoming Gemini is rumored to outperform Bard with its multimodal prowess. NexT-GPT offers an open-source alternative, and the horizon promises models trained to juggle words, sounds, videos, and images.

Such advancements aren't just technobabble—they hold implications that could reshape our daily interactions, professions, and perhaps even our worldview. And while OpenAI pioneers with GPT-V, competitors aren't far behind. Could we be on the brink of an AI renaissance?

Well, if you're still using AI just for chat, you might already be falling behind. AI can read and see, and gets more capabilities every day.

GPT-V can also ruin the fun of a "Where's Waldo?" book. Why would someone want this? This is ChaosGPT territory.

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Get Email!

Japan Rates Hit Three-Decade High, But No ‘Meaningful Disruption’ to Crypto Market

Crypto markets held steady on Tuesday even as Japan lifted interest rates to a three-decade high amid rising domestic inflation. The Bank of Japan’s policy board raised its benchmark interest rate to around 1% in a 7-1 vote, with the new guideline effective June 17. Policymakers flagged a risk of inflation rising above a 2% target as higher oil prices feed through to consumer goods, with further hikes expected. A relief rally lifted crypto markets after President Trump announced a deal with Ira...

SpaceX Shares Hit New High as Elon Musk's Firm Agrees to Acquire AI Startup Cursor for $60 Billion

SpaceX shares hit a new high on Tuesday after Elon Musk's company announced a $60 billion all-stock deal to acquire AI coding startup Cursor, extending a post-IPO rally that has pushed SpaceX’s stock above $210 per share, according to MarketWatch. The Cursor acquisition was disclosed in a Form 8-K filed with the U.S. Securities and Exchange Commission. “SpaceX has exercised the option to acquire [Cursor] in an all-stock transaction with the goal of building the world's most useful AI models,” S...

7 Ways Businesses Are Using Crypto Swap APIs

For many wallets, fintech applications, and cross-chain platforms, embedded swap infrastructure has become a way to expand asset coverage, improve execution, and generate new revenue streams without taking on the complexity of operating an exchange. Instead of building liquidity systems from scratch, companies increasingly rely on crypto swap APIs (Application Programming Interface) that connect users to multiple liquidity sources while preserving existing user experiences. Real-world deployment...

News

Courses

Deep Dives

Coins

Videos