Features / Technology

AlphaFold: AI Breakthrough for Protein Folding Problem

by Deep Bandivadekar · February 5, 2021

While the pandemic is still raging, the scientific community has continued to work hard on research in all fields. DeepMind, an artificial intelligence (AI) laboratory based in London, declared in November a milestone achievement – they succeeded in solving the decades old protein-folding problem. One of the most challenging problems in the modern world, the protein folding problem has been unsolved until now.

What is DeepMind and AlphaFold?

DeepMind started in 2010 as a research group focused on application of AI in video games, it made waves in 2016 when its AI program AlphaGo won against a human professional Go player ¹. It has since improved upon its algorithms and successfully dominated other games such as chess, Starcraft II, and Atari. AlphaFold is DeepMind’s latest AI program that uses supervised learning techniques to attack the computationally expensive problem of protein folding.

The Protein Folding Problem

Instructions in the DNA help build proteins. Image by Gerd Altmann from Pixabay

Proteins are self-assembling bio-machines on a very small scale, or in technical terms, long molecular chains of amino acids. Haemoglobin, keratin, and insulin are some commonly known proteins to name a few. Scientists are aware of millions of proteins that exist in various biological processes. Although, to put what scientists already know in perspective, the Protein Data Bank (PDB) includes only about 170000 protein structures. 3D shapes of a vast majority of naturally-occurring proteins are still unknown.

These proteins have extremely complex shapes, often looking messy. However, the shapes the proteins take are not haphazard; they are determined by cellular level chemical reactions. The DNA of a species determines the sequence of the amino acids, while the RNA helps transcribe these proteins according to the instructions in the DNA. Molecular biologists refer to this as the central dogma ². The RNA generates the proteins in a linear fashion, however the proteins do not remain as straight chains for long: they fold and coil up in specific positions. Due to this mechanism, various components of the protein folding in their unique way adds up to create a massive complex 3D structure.

An example of how complex proteins are shaped: the structure of TMEM171 protein.
Image credit: Bauma319 on WikiCommons

The 3D shapes of proteins determine which targets they can attach with, analogous to a key fitting only its designated lock. Which also means that the specific shape of proteins is their signature and thus defines their function. For most of the proteins in the PDB, we only know about their amino acid sequence encoded in the DNA, not their structure. If we know the exact shape of a protein and how it is formed we can use this knowledge to target specific protein functions e.g. better treatment of diseases, biological waste treatment, or improving cell performance. Correctly predicting the 3D shape of proteins from the 1D amino acid sequence is the problem of protein folding that has eluded scientists for over half a century.

While X-ray crystallography is a reliable technique to find the protein structures, it is also expensive. Other experimental methods, such as nuclear magnetic resonance (NMR) or electron microscopy, can give clues to protein structures to a certain extent. However, these methods also have limitations. They are cumbersome, expensive, and are not universal. We know protein structures of only a small percentage of existing proteins. Hence, researchers have started exploring alternative ways using the computational power of advanced computers: DeepMind’s AlphaFold is a successful effort in this direction.

A Breakthrough by AlphaFold

Even if computers can be used to do the hard labour, it is no easy task. The number of possible configurations for a given number of atoms in a naturally-occurring protein is astronomical, of the order of 10³⁰⁰ for a standard protein chain molecule ³. It would take millions of years to test each configuration individually. Unleashing the power of artificial intelligence is a way to reduce the prediction time. In 1994, researchers started a biennial competition, Critical Assessment of Structural Prediction competition (CASP) ⁴ to assess and monitor progress of efforts by comparing the performance of various independent algorithms to correctly predict the known protein structures. It is a unique global platform based on shared knowledge.

The 14th version of CASP (CASP14) was held, mid-2020 in which multiple groups competed to correctly predict the structures of about 90 proteins ⁵. At the end of the competition, on November 30, 2020, DeepMind researchers announced their breakthrough. The latest version of the program, AlphaFold2 had achieved great success in predicting the structures for the target proteins with the highest median score for Global Distance Test (GDT). In absolute simplest terms, the GDT score (between 0 and 100) is the percentage of the protein structure that is correctly predicted. While the experimental methods are informally considered to have a median score of 90, AlphaFold2 scored 92.4, decidedly higher. While most people expected to reach this level of accuracy in a few decades, DeepMind seems to have not only accelerated the solution at an incredible pace but dominated all other programs competing at CASP14.

DeepMind performed much better than others at CASP14. Pink line refers to AlphaFold2.
(Image used with permission from Prediction Center and CASP)

AlphaFold2 used multiple deep neural networks for different components of the proteins to predict the optimal distance between a pair of amino acids in the final structure. One of the key aspects of the program that probably made it more accurate is it decides which protein sequence blocks to regard as significant using a numerical confidence measure, discards the rest, and then builds them up for a final global structure which will have a maximum likelihood of being correct. The algorithm has performed really well with a median GDT score of 87 for the most challenging proteins in CASP14.

The results have far-reaching consequences. It gives computational biologists a new tool that is accurate and reliable. Earlier this year, DeepMind used it to predict several protein structures for SARS-CoV-19 virus. Additionally, it is a marker of how powerful AI can become. The future is certainly bright.

This article was specialist edited by Alexander Telfar and copy-edited by Richard Murchie.

Author

Deep Bandivadekar

View all posts

References

Tags: Artificial Intelligence Computational biology DNA Protein folding Proteins

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

GlasgowTheGIST Follow

The UK’s biggest student-run science media outlet. Check out our multi-award-winning content 🏆 Click the link👇 to read, listen or get involved.

GlasgowTheGIST @glasgowthegist ·

Come join theGIST X @UofGLivingLab X @SunnyGRadio !
.
If you fancy sharing your ideas in medical fields with the general public, please don’t hesitate to send your script or just a topic idea to us.
.
Deadline: ASAP
Contact: [email protected]

Reply on Twitter 1813259074534056355 Retweet on Twitter 1813259074534056355 Like on Twitter 1813259074534056355 Twitter 1813259074534056355

GlasgowTheGIST @glasgowthegist ·

📢Write a feature article for us and get published in our GIST Spring 2024 magazine issue!
.
- Topics: Science and/or tech related
- Submissions: a feature article (max.1,500 words)
- Contact: [email protected]
🔥Please email us ASAP!

Reply on Twitter 1792571176679625088 Retweet on Twitter 1792571176679625088 1 Like on Twitter 1792571176679625088 Twitter 1792571176679625088

GlasgowTheGIST @glasgowthegist ·

New snippet!
.
'Dolphins with Dementia?'
Post-mortem brains of stranded dolphins and whales show evidence of Alzheimer’s disease-like neuropathology, raising the question of dementia within other animal species.
.
Author: Nicole Edwards
Link: https://the-gist.org/2024/04/dolphins-with-dementia/

Reply on Twitter 1784235895765528776 Retweet on Twitter 1784235895765528776 Like on Twitter 1784235895765528776 Twitter 1784235895765528776

GlasgowTheGIST @glasgowthegist ·

New snippet!
.
'Exercise for cardiovascular health: the controversy of zone training'
Examining exercise zone training for cardiovascular health – high or low intensity?
.
Author: Grace Whelan @GraceWhelan13
Link: https://the-gist.org/2024/04/exercise-for-cardiovascular-health-the-controversy-of-zone-training/

Reply on Twitter 1782018365357502632 Retweet on Twitter 1782018365357502632 2 Like on Twitter 1782018365357502632 1 Twitter 1782018365357502632

GlasgowTheGIST @glasgowthegist ·

New snippet!
.
'The Sneaky Virus: Herpes Simplex Virus and its Success Story'
Herpes Simplex Virus (HSV) have infected 3.7 billion adults under 50 worldwide. Yet, most individuals don’t know they have it. Here's why.
.
Author: Beth Crookes
Link: https://the-gist.org/2024/04/the-sneaky-virus-herpes-simplex-virus-and-its-success-story/

Reply on Twitter 1781306794671059446 Retweet on Twitter 1781306794671059446 Like on Twitter 1781306794671059446 Twitter 1781306794671059446

AlphaFold: AI Breakthrough for Protein Folding Problem

Author

References

You may also like...

Leave a Reply Cancel reply

Watch Our Videos

Get involved!

AlphaFold: AI Breakthrough for Protein Folding Problem

Author

References

You may also like...

Finding the Origins of Life in Space

Seeing Red – Presumptive Tests for Blood

Sexy Farmers

Leave a Reply Cancel reply

Watch Our Videos

Get involved!