RNA function follows form – why is it so hard to predict?

AlphaFold’s highly accurate structural models transformed protein biology,but RNA lags behind

RNA structure poses unique challenges for computational models.Credit: Getty
At a virtual conference in November 2020, the winner of a biennial protein-structure-prediction challenge was announced: AlphaFold. Created by Google DeepMind, this computational tool had blown its competitors out of the water by solving dozens of protein structures with atomic-level accuracy, accomplishing a feat that researchers had been attempting for decades.
The challenge, known as the Critical Assessment of Protein Structure Prediction (CASP), was launched in 1994 to advance computational tools for modelling 3D protein configurations from their amino-acid sequences. Teams of scientists pitted their computational models against each other, trying to generate the most accurate predictions for previously unknown protein structures that, before the event, are solved experimentally using methods such as X-ray crystallography and cryo-electron microscopy.
AlphaFold’s 2020 predictions rivalled those solved with these tried-and-tested techniques, and it has since become a favourite of the structural-biology community. Its repository — the AlphaFold Protein Structure Database — contains some 200 million structures, and, in 2024, AlphaFold’s developers shared half of the Nobel Prize in Chemistry for their work.
But that’s proteins. In 2022 the organizers of CASP turned their attention towards a different, yet still challenging, class of biomolecules: RNA.
As with proteins, determining RNA structure typically requires costly and time-consuming experimental methods. Computational tools can help, but RNA is a tougher nut to crack. One simple reason, according to Yu Li, a computer scientist at the Chinese University of Hong Kong, is historical. For a long time, most scientists didn’t think RNA biology was interesting enough to study. But RNA also poses unique molecular challenges, and relatively few data are available to train computational models of the type that have performed so well with proteins.
Researchers have been getting creative, however, and there is a growing toolkit of computational tools emerging to aid the prediction of RNA structure. Many of these incorporate the latest developments in artificial intelligence (AI), including the large language models (LLMs) that underlie popular chatbots, such as ChatGPT.
“RNA folding is a very tough problem,” concedes Shi-Jie Chen, a computational biophysicist at the University of Missouri in Columbia. But AI, he adds, is getting “better and better”.
Elusive targets
For a long time, RNA was seen simply as an intermediary between two more interesting classes of molecule: DNA, the ‘blueprint of life’, and proteins, the ‘building blocks’ of the cell. Only a small fraction of the human genome encodes proteins, yet much of the non-coding genome is transcribed into RNA. Over the past few decades, scientists have discovered that these non-coding RNAs mediate essential functions in healthy cells — and contribute to many diseases.
How these RNAs work remains, in many cases, a mystery. Researchers hope that, by determining their shape, they will be able to understand better the role that these molecules have in making our cells tick — a question of form dictating function. “In biology, we assume that the sequence is very likely to determine the structure, and that the structure is very likely to determine the function,” says Li.
But computational tools for predicting RNA structure lag behind their protein equivalents. Even AlphaFold3, the latest version of DeepMind’s structure-prediction tool — falls short when it comes to RNA.
“If you look at the recent CASP competitions, we are at the point where, on the protein structure side of things, fully automated teams are as good as human teams,” says Lydia Freddolino, a systems biologist at the University of Michigan in Ann Arbor and a scientific advisory board member for CircNova, a company that uses deep-learning tools to design circular RNA-based therapeutics. “For RNA, we are nowhere near that — all the top groups make heavy use of human intervention.”
RNA-structure prediction featured in CASP competitions in 2022 and 2024, and Freddolino participated in both. The team that ranked first for predicting RNA structures at the latest event, CASP16, used a hybrid approach: combining AI with a defined, physics-based algorithm. According to Chen, who led the winning group, they first used AlphaFold3 to generate ensembles of possible RNA structures, and then applied a physics-based model that probes the ‘energy landscape’ of possible structures to pinpoint the conformations that are most likely to form. (Chen’s team has licensed their software to several biotechnology firms.)
Researchers developing AI-only tools for predicting RNA structure face numerous obstacles. One is that RNA molecules have features that make their structures inherently hard to predict. RNA molecules have more flexible backbones than do proteins, and their structures are more dynamic, meaning that they can undergo substantial conformational changes while carrying out their biological tasks.
On top of that, RNA molecules lack the different chemistries that can be found in proteins, such as acidic and basic residues, that allow for stable connections to form. Instead, segments of RNA interact in all kinds of “weird and wonderful ways”, Freddolino says, such as through different base pairings and the involvement of metal ions. As a result, the subtle variations between the best and worst models are trickier to spot than with proteins.

Natural (R1116 and R1149) and synthetic (R1138) RNA structures, used in the structure-prediction task CASP15, measured experimentally (grey) and predicted using an AI tool (red).Credit: W. Wang et al./Nature Commun.
The chemical alphabet of RNA is also harder to interpret: the four chemical bases that make up RNA are less distinct than the 20 amino acids found in proteins. That means that each RNA base contains less information than an amino acid. One reason tools such as AlphaFold have been so successful, Freddolino notes, is the ability to use large sequence databases to pinpoint patterns of interactions between different amino acids — and this is much more difficult to do with RNA.
And then there’s the paucity of known RNA structures. The Protein Data Bank, a repository of 3D macromolecular structures, contains nearly 200,000 protein structures and fewer than 2,000 RNAs. This dearth of data means that there is less information to feed the algorithms that underlie AI-based structure prediction.
“We’re doing as well as we can with the limited data that we have,” says Jim Collins, a biomedical engineer at the Massachusetts Institute of Technology in Cambridge. “The field would advance considerably with the collection and curation of many more structures.”
Bringing in AI
Researchers have been working to address these challenges, and, in recent years, several AI-based RNA-structure-prediction tools have emerged. Before 2020, most of the methods for predicting RNA structure were based on algorithms defined by specific physical or mathematical models, according to Jianyi Yang, a computational biologist at Shandong University in Qingdao, China. But the success of AlphaFold has inspired people in the RNA field to apply AI to this problem, too, he says.
Yang and his colleagues designed a fully automated (and freely available) AI tool, trRosettaRNA, which combines deep learning with elements of Rosetta, a computational tool used for determining molecular structures that was created by David Baker at the University of Washington in Seattle, who shared the 2024 chemistry Nobel with the creators of AlphaFold.
Just as for proteins, the structure of RNA occurs on multiple levels: nucleotide sequence (primary); intermediary structures that form when base pairs find their complements (secondary); and the final, 3D structure (tertiary). RNAs can also form complexes with each other and other molecules (quaternary). First, trRosettaRNA generates predictions of primary and secondary structures, then, with the help of a classical physics-based model, it reconstructs tertiary structures. Secondary structures — such as ‘hairpins’ that form when short segments of sequence pair up with one another — are much more important for RNA than they are for proteins, Yang says, and using these in-between structures is one of the keys to this model’s success.
Yang’s team pitted trRosettaRNA against other automated tools and found, on the basis of an assessment with two independent data sets of dozens of RNAs, that it surpassed those tools in accuracy1. In 2024, the software placed fourth at CASP16.
Enjoying our latest content?
Login or create an account to continue
- Access the most recent journalism from Nature's award-winning team
- Explore the latest features & opinion covering groundbreaking research
or
Sign in or create an accountNature 639, 1106-1108 (2025)
doi: https://doi.org/10.1038/d41586-025-00920-8
This story originally appeared on: Nature - Author:Diana Kwon