Has AI solved a 50-year-old biological problem?

By

Last November it was announced that an artificial intelligence (AI) network developed by DeepMind, AlphaFold 2, had achieved astounding results in an international protein structure prediction competition. The result sent ripples of excitement through the scientific community. It provided tantalising evidence that one of the grand challenges in biology had been solved “decades before many people in the field would have predicted”, according to Venki Ramakrishnan, Nobel laureate in chemistry.

Proteins are essential to all life and are required for almost every biological function, from catalysing the metabolic reactions that give us energy to contracting muscles. Membrane proteins alone are the target for over half of all pharmaceutical drugs. Proteins are biological polymers made of long chains of amino acids, with the sequence of amino acids and their interactions determining the shape of the protein. This idea, that a protein’s structure should be fully described by its amino acid sequence, was proposed by Christian Anfinsen in his 1972 Nobel prize speech, sparking 50-years of headaches for computational biologists.

Proteins are essential to all life and are required for almost every biological function

A protein’s structure is hugely important to its function. For example, antibodies, particularly the antigen-binding site, rely on a specific protein structure to bind to a specific part of an antigen to tag it for attack from other parts of the immune system. So, what’s the problem? Well, due to the flexibility of the bonds joining the amino acids in the protein chain there is a ridiculously large number of possible structures, which makes predicting the right one extremely difficult.

Yet the results released at CASP14 (CASP for Critical Assessment of protein Structure Prediction), a biennial competition between over 100 international research groups, do indicate that the problem has been solved. CASP involves teams competing to predict protein structures from only their amino acid sequences. These predicted structures are then compared to the experimentally determined structures. Importantly, the test is conducted double-blind; neither the competitors nor the organisers know the structures of the proteins when predictions are made. The main metric used at CASP is the Global Distance Unit (GDU) with a score of 90 informally considered on par with experimental techniques. Alpha Fold 2 achieved a score of 92.4 GDU overall and 87.0 in the extremely challenging free-modelling category. Ground-breaking.

The main metric used at CASP is the Global Distance Unit (GDU) with a score of 90 […] on par with experimental techniques

There are two main ways the DeepMind team achieved such remarkable results. First, they were informed by recent ideas from academic research. One of these ideas involved utilising a multiple sequence alignment (MSA) to predict close contacts between amino acids at various positions in the protein’s sequence by discovering evolutionary couplings. The idea is simpler than it initially sounds and boils down to this: if two amino acids are in close physical contact then mutations in one will be followed by mutations in the other to preserve structure. AlphaFold 2 contained an end-to-end deep learning architecture that was fed an MSA as input before directly outputting a structure. The other key advantage for the AlphaFold team is simply the fact that it is an extremely well-resourced industrial lab. However, their success does still raise questions about the efficiency of academic research and even more damning questions for the pharmaceutical industry, for whom the question of protein folding is of the upmost importance, given a small team could outperform them both in just a couple of years.

If two amino acids are in close physical contact then mutations in one will be followed by mutations in the other to preserve structure

Overall, while the success of AlphaFold 2 tells us little about the actual process of protein folding, its astounding powers of prediction are truly a massive step forward for science. The protein folding problem has been solved and with it many other problems facing humanity, from drug discovery to waste processing, move closer to a solution. We can only hope DeepMind make AlphaFold 2 publicly available in some form to ensure that this astounding development changes the world for the better in all the ways it has to potential to.

Image: Eelke via Flickr

Leave a Reply

Your email address will not be published. Required fields are marked *

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.