Tether's Medical AI Runs on Your Phone and Outperforms Models 16x Its Size

QVAC MedPsy squeezes clinical AI onto a smartphone, beating Google's MedGemma-27B on real-world scenarios while using three times fewer compute resources.

By Jose Antonio Lanz

3 min read


Tether, the stablecoin company best known for USDT, just released a medical AI model that fits in your pocket and may outperform rivals more than a dozen times its size. QVAC MedPsy launched today from Tether's AI Research Group as a new class of medical language models designed to run on smartphones, wearables, and edge devices—no cloud required.

The headline number: a tiny 1.7 billion-parameter model capable of beating Google's MedGemma-4B on medical benchmarks despite being less than half its size. On HealthBench Hard—OpenAI's benchmark that evaluates AI on realistic, multi-turn clinical conversations graded by 262 physicians—Tether says its 1.7 billion-parameter model outscores MedGemma-27B, a model nearly sixteen times larger.

Parameters are all the configurations and values that a model learns during trading. The more the parameters, the better the model should be, in theory.

Source: Tether

The test suite spans MedQA-USMLE, which measures clinical knowledge using US medical licensing exam-style questions scored as percentage accuracy, all the way to AfriMedQA, which tests performance specifically for underserved African healthcare contexts.

Tether CEO Paolo Ardoino credited the gains to efficiency rather than scale. "With QVAC MedPsy, our focus was improving efficiency at the model level, rather than scaling up size," he said in a statement. "Our 4 billion model exceeded results from models nearly seven times its size, while using up to three times fewer tokens per response."

That token efficiency is the other headline. The 4B model averages around 909 tokens per response versus 2,953 for comparable systems—a 3.2x reduction. Fewer tokens means lower compute cost, faster responses, and crucially, the ability to run locally without a cloud backend.

"You can run medical reasoning where the data already exists, inside a hospital system or on a device, without moving sensitive information through the cloud or waiting on external processing," Ardoino said.

The models ship as quantized GGUF files—1.2 GB for the 1.7 billion-parameter model and 2.6 GB for the 4 billion—with compressed versions retaining most benchmark performance while fitting on standard consumer hardware. That means a hospital system, rural clinic, or individual clinician could run the model entirely on-device, keeping patient records out of third-party cloud infrastructure and away from HIPAA exposure.

The privacy pitch may be a major plus for some people but using AI for medical opinions is far from ideal even by today’s standards. An Oxford study published in February found that LLMs are routinely giving dangerous medical advice with wrong answers, confused guidance and poor handling of nuanced symptoms. The researchers stopped short of dismissing the technology entirely, but argued AI has a role as "secretary, not physician." The compliance problem compounds it: Most medical AI today routes patient data through cloud servers, creating HIPAA exposure every time a doctor types a query.

The release fits Tether's pattern over the past year. Last month it shipped the QVAC SDK, an open-source toolkit for building local, offline AI apps across iOS, Android, Windows, and Linux. Before that, it launched QVAC Health, a consumer wellness app that keeps biometric data entirely on-device. MedPsy is the first QVAC model specifically trained for clinical reasoning.

The medical AI market sits at roughly $36 billion today, with projections pointing past $500 billion by 2033, per Tether's own announcement. Models and GGUF weights are available now at qvac.tether.io/models.

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Recommended News