Questions remain about whether the AI tool for predicting protein structures can really shake up the pharmaceutical industry

AlphaFold touted as next big thing for drug discovery — but is it?

Questions remain about whether the AI tool for predicting protein structures can really shake up the pharmaceutical industry

This protein, whose structure was predicted by AlphaFold, is part of the nuclear pore complex, which is a gateway for molecules entering a cell’s nucleus and is a drug target.Credit: DeepMind

After Google Deepmind’s AlphaFold proved that it could predict the 3D shapes of proteins with high accuracy in 2020, chemists became excited about the promise of using the open-source artificial-intelligence (AI) programme to discover drugs more quickly and cheaply. Most drugs work by binding to various sites on proteins, and AlphaFold could predict the structures for proteins that scientists previously knew little about.

Last month, the biotechnology firm Recursion, based in Salt Lake City, Utah, announced that it had calculated how 36 billion potential drug compounds could bind to more than 15,000 human proteins whose structures were predicted by AlphaFold. To pull off the massive computation, Recursion used its own AI tool, MatchMaker, that ‘matched’ binding pockets on the predicted structures with suitably shaped small molecules, or ligands, from a database called Enamine Real Space.

“Lots of people have predicted how molecules would bind with proteins,” says Chris Gibson, Recursion’s co-founder and chief executive, “but this many predictions is pretty unprecedented.”

But not everyone is as bullish about AlphaFold revolutionizing drug discovery — at least, not yet. In a paper published in eLife the day before Recursion’s announcement¹, a team of scientists at Stanford University in California showed that AlphaFold’s prowess at predicting protein structures doesn’t yet translate into solid leads for ligand binding.

“Models like AlphaFold are really good with [protein] structures, but we need to put some thought into how we’re going to use them for drug discovery,” says Masha Karelina, a biophysicist at Stanford and co-author of the paper.

Others who spoke to Nature agree that this type of effort offers impressive amounts of data, but they aren’t yet sure about its quality. Biotech announcements such as the one from Recursion aren’t typically accompanied by validation data — confirmation from laboratory experiments that a model has accurately predicted binding. The calculated interactions are also based on predicted, rather than experimentally determined, protein structures which might not contain the atomic-level resolution that drug developers need to pinpoint where the strongest binding might occur. What’s more, the sheer number of predicted interactions (Recursion predicted 2.8 quadrillion) means that even a small percentage of false-positive ‘hits’ can lead to costly delays while scientists waste valuable time trying to validate them, says Brian Shoichet, a pharmaceutical chemist at the University of California, San Francisco.

The result, Shoichet says, is a lot of excitement, but also a lot of questions.

Unfolding the problem

The idea of using computational tools in drug discovery is to “make it easier, faster and cheaper to play with all the parameters that make a good drug”, says Vsevolod Katritch, a computational biologist at the University of Southern California in Los Angeles. By using AI models to find leads, a drug company might need to test only a few hundred compounds in the lab, instead of thousands. This can shave millions off the cost and bring a compound to market in years instead of decades.

AlphaFold and similar programs, such as RoseTTAFold, which was developed by an international team led by researchers at the University of Washington’s Institute of Protein Design, promise to shake up the pharmaceutical industry further because the structures of many human proteins had been lacking, making it difficult to find treatments for some diseases. The programmes have become so good at predicting 3D protein shapes that of the 200 million protein structures deposited into a database last year, the European Molecular Biology Laboratory’s European Bioinformatics Institute deemed 35% to be highly accurate — as good as experimentally determined structures — and another 45% accurate enough for some applications.

On the surface, making the leap from AlphaFold’s and RoseTTAFold’s protein structures to the prediction of ligand binding doesn’t seem like such a big one, Karelina says. She initially thought that modelling how a small molecule ‘docks’ to a predicted protein structure (which usually involves estimating the energy released during ligand binding) would be easy. But when she set out to test it, she found that docking to AlphaFold models is much less accurate than to protein structures that are experimentally determined¹. Karelina’s still not 100% sure why, but she thinks that small variations in the orientation of amino-acid side chains in the models versus the experimental structures could be behind the gap. When drugs bind, they can also slightly alter protein shapes, something that AlphaFold structures don’t reflect.

Laksh Aithani, chief executive and co-founder of London-based Charm Therapeutics, agrees with Karelina’s findings that RoseTTAFold and AlphaFold don’t perform well when determining small-molecule docking.

Charm is trying a different way of evaluating protein–drug binding. The technique uses an AI tool, called DragonFold, that is built on a RoseTTAFold backbone. It models the 3D shape of the protein and ligand bound together, which Aithani says allows Charm to account for changes in protein shape that occur with ligand binding and to modify the would-be drug to create tighter, more selective binding. The effort isn’t far enough along for Aithani to reveal many details, but he says the project has attracted the interest of pharmaceutical firm Bristol Myers Squibb, based in Lawrenceville, New Jersey.

The road ahead

In the end, the challenge for these groups, says Shoichet, isn’t to design a model that will identify how well molecules bind, but to create a system that can identify compounds that bind strongly to proteins about which little is known. To make progress, validation in the lab is necessary, he says.

Industry should be able to do the validation, says Bonnie Berger, a mathematician at the Massachusetts Institute of Technology in Cambridge. At the moment, however, if industry is doing it, it isn’t sharing that data.

“There’s a lack of transparency from companies like Recursion, who make predictions without fully sharing their methods or results. It’s a problem for me and for the field,” she says.

Recursion responds that it has shared validation data on MatchMaker in two studies: one in Scientific Reports in 2021², and one in the Journal of Chemical Information and Modeling earlier this year³.

“Sharing these exciting technical milestones in real time as they occur is our way to share how we are thinking about drug discovery with the community and the broader general public,” says Recursion spokesperson Ryan Kelly.

Berger says that competitions such as the one that put AlphaFold on the map could not only help drive drug discovery forwards, but also shed more light on industry’s methods. AlphaFold made headlines when it won the biennial Critical Assessment of protein Structure Prediction (CASP) contest in 2020, in which researchers had to test their prediction models against a set of proteins for which structures were experimentally determined, but not yet publicly released. In the same way, an AI tool’s results for drug–protein interactions could be compared with lab results for binding.

“There’s a huge amount of effort going on” to harness models such as AlphaFold for drug discovery, Shoichet says. But “things are still just ramping up”.

doi: https://doi.org/10.1038/d41586-023-02984-w

This story originally appeared on: Nature - Author:Carrie Arnold