A group of U.S. researchers has developed an artificial intelligence model called CancerGPT, which uses large pre-trained language models (LLMs) to predict how different drug combinations might affect rare human tissues found in cancer patients. This new approach could represent major progress in the field of medical research, particularly in areas where structured data and sample size are limited.
The study was conducted by a joint team from the University of Texas and the University of Massachusetts, and used LLMs to extract prior knowledge from medical research texts and then applied it to proposed biological inference tasks. The team demonstrated that the model achieved significant accuracy.
“Our experiments, which involved seven rare tissues from different cancer types, demonstrated that the LLM-based prediction model achieved significant accuracy with very few or zero samples,” the research paper reads.
The use of AI LLMs in medical research has been a hot topic in 2023. Decrypt recently reported that Ankh, an LLM that understands how proteins communicate, was created by a group of experts from the universities of Munich and Columbia in collaboration with the biotech company Protinea. Also, another group of researchers used AI technology to identify three promising candidates for senolytic drugs, which have the potential to slow the aging process and mitigate age-related diseases, killing so-called “zombie cells.”
CancerGPT is a LLM with approximately 124 million parameters, comparable to the larger fine-tuned GPT-3 model, which has approximately 175 million parameters. The study used zero-shot GPT-3, a type of LLM, to provide coherent responses. They evaluated the answers to different tasks by comparing them with existing scientific literature and found that the LLM provided mostly accurate arguments.
They also noted, however, that “the accuracy of its arguments cannot always be verified and may be susceptible to hallucination.”
The researchers believe that cancer types for which they have limited structured data still have valuable information represented in scientific literature. By leveraging the power of pre-trained language models, they were able to make use of existing resources and obtain “generalizability,” improving their capacity to make predictions for future reactions.
Generalizability is the ability of a model to apply what it has learned from the training data to predict new, unseen data. This is one of the things that differentiates AI from traditional deterministic computer programs.
The researchers recommend that future studies delve deeper into the approach and develop an ensemble method that effectively utilizes both existing structured features and newly surfaced prior knowledge encoded in LLMs.
Despite the potential challenges, the study results highlight the value of AI technology in modern biology. From enhancing personalization to increasing efficiency and boosting success rates, AI is proving to be a game-changer.