What if AI could talk to proteins? You know, those large, complex molecules that play many critical roles in the body? Well, hold on to your lab coats, because Ankh— a new protein language model—purports to do just that.

Ankh was created by a group of experts from the Universities of Munich and Columbia in collaboration with the biotech company Protinea. The name comes from an ancient Egyptian symbol representing life, apt for an AI language model that delves into the very building blocks of life.

According to a research paper, Ankh learns the “language of proteins” by analyzing a large dataset of protein sequences, and then uses this knowledge to create new protein sequences and then attempts to determine how they might work.

A protein language model like Ankh and a large language model like ChatGPT are similar. In proteins, the alphabet consists of amino acids. These amino acids link together to form chains, sort of like words. The sequence of amino acids must be in a specific order for the protein to fold into the correct 3D shape, which is essential for its function. Basically, it’s like the way people put words together in a specific language, following a set of rules in order to properly communicate.

A Large Language Model works by trying to predict which word would make the most sense in a specific output according to a prompt, and Ankh basically tries to do the same, guessing which biological configuration would make the most sense for a specific output considering everything we know about proteins and their structural rules.

Understanding proteins (and their language) is crucial to human biology. They play a key role in the structure, function, and regulation of the body’s tissues and organs.

Ankh’s ability to analyze and predict protein behavior could be extremely helpful in the fields of medicine, environmental science, and more. For instance, in drug discovery, Ankh could be used to predict how proteins will interact with various compounds, which can significantly speed up the development of new medications.

Additionally, it can help scientists understand how mutations in proteins can lead to diseases, which is invaluable in genetic research.

Beyond medicine, Ankh has applications in synthetic biology, where it can be used to design new proteins that exhibit desired functions. This has far-reaching implications for fields like renewable energy and materials science. By designing proteins that can, for example, break down plastics or produce biofuels more efficiently, Ankh can contribute to solving some of the most pressing environmental challenges of our time.

Uses for Ankh. Image: Arvix.org
Uses for Ankh. Image: Arvix.org

As recently reported by Decrypt, OpenAI—the AI giant known for developing the image generator Stable Diffusion—is no stranger to the protein world. They’ve been dabbling in AI for protein research too. With Ankh also on the scene, it seems like AI is heating up the research field, making proteins as interesting as they can be.

Ankh is publicly available and distributed under the CC BY-NC-SA 4.0 License, so if you want to talk to your proteins, go ahead.

As for what’s next for Ankh? The researchers are tight-lipped, but we can expect rapid development of new features and improvements. And as for Chaos-GPT, the AI bent on world domination? It might want to consider teaming up with Ankh. Because, let’s face it: proteins rule the world.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.