There’s More Evidence ChatGPT Is a Good Doctor But a Bad Coder

Two research papers evaluating ChatGPT's accuracy conclude that the AI chatbot seems to be better at writing prescriptions than writing code.

3 min read

Aug 11, 2023

In the race to develop advanced artificial intelligence, not all large language models are created equal. Two new studies reveal striking differences in the capabilities of popular systems like ChatGPT when put to the test on complex real-world tasks.

According to researchers at Purdue University, ChatGPT struggles with even basic coding challenges. The team evaluated ChatGPT's responses to over 500 questions on Stack Overflow, an online community for developers and programmers, on topics like debugging and API usage.

"Our analysis shows that 52% of ChatGPT-generated answers are incorrect and 77% are verbose," the researchers wrote. "However, ChatGPT answers are still preferred 39.34% of the time due to their comprehensiveness and well-articulated language style."

In contrast, a study from UCLA and the Pepperdine University of Malibu demonstrates ChatGPT's prowess at answering difficult medical exam questions. When quizzed on over 850 multiple-choice questions in nephrology, an advanced specialty within internal medicine, ChatGPT scored 73% —similar to the passing rate for human medical residents.

Image credit: UCLA via Arvix

"The demonstrated current superior capability of GPT-4 in accurately answering multiple-choice questions in Nephrology points to the utility of similar and more capable AI models in future medical applications," the UCLA team concluded.

Anthropic’s Claude AI was the second best LLM with 54.4% correct answers. The team evaluated other open-source LLMs but they were far from acceptable, with the best score being 25.5% achieved by Vicuna.

So why does ChatGPT excel at medicine but flounder at coding? The machine learning models have different strengths, notes MIT computer scientist Lex Fridman. Claude, the model behind ChatGPT’s medical knowledge, received additional proprietary training data from its maker Anthropic. OpenAI’s ChatGPT relied only on publicly available data. AI models do great things if properly traiend with huge amounts of data, even better than most other models.

Image courtesy: MIT

However, an AI won’t be able to act properly outside the parameters it was trained on, so it will try to create content with no prior knowledge of it, which results in what's known as hallucinations. If the dataset of an AI model does not include a specific content, it will not be able to yield good results in that area.

As the UCLA researchers explained, "Without negating the importance of the computational power of specific LLMs, the lack of free access to training data material that is currently not in public domain will likely remain one of the obstacles to achieving further improved performance for the foreseeable future."

ChatGPT clunking at coding aligns with other assessments. As Decrypt previously reported, researchers at Stanford and UC Berkeley found ChatGPT’s math and visual reasoning skills declined sharply between March and June 2022. Though initially adept at primes and puzzles, by summer it scored only 2% on key benchmarks.

So while ChatGPT can play doctor, it still has much to learn before becoming an ace programmer. But it’s not far from reality, after all, how many doctors do you know that are also proficient hackers?

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Get Email!

China’s Z.AI Releases GLM-5.2: A Model That Rivals Claude Opus—Using Zero Nvidia Chips

Z.ai dropped GLM-5.2 on June 16, promising top level performances, beating its already advanced GLM 5.1. The Beijing-based lab, which has been on the U.S. Entity List since January 2025, appears to be benefiting from growing concerns over America's approach to AI. Over the past week, the ban on Anthropic Fable and the release of this new model have helped drive zAI's stock up 90%, sending it to a new all-time high. GLM 5.2 has the numbers to back up the hype. On FrontierSWE—a benchmark that eva...

Midjourney Pivots From AI Images to Medical Imaging, Aiming to Build a Better MRI Alternative

Midjourney, the AI company best known for its image generation platform of the same name, is expanding into healthcare. On Wednesday, the company unveiled Midjourney Medical, a new division developing what it calls "Ultrasonic CT," a full-body imaging system that combines ultrasound hardware with AI-powered image reconstruction. Midjourney says the technology could create detailed three-dimensional body scans in roughly 60 seconds. “Our goal at Midjourney Medical is to deploy around 50,000 of t...

CFTC Hits Celsius Crypto Fraudster Alex Mashinsky With Permanent Trading Ban

The Commodity Futures Trading Commission (CFTC) has resolved its 2023 enforcement action against Celsius founder Alex Mashinsky, permanently banning him from trading markets regulated by the CFTC. The consent order also imposes a permanent CFTC registration ban on the former crypto founder, and marks the completion of the regulator’s first case against a digital asset lending platform, according to its 2023 press release. Mashinsky, who also acted as the CEO of Celsius, was imprisoned for 12 y...

News

Courses

Deep Dives

Coins

Videos