A fish on land still flaps its fins, but the results are significantly different when that fish is in the water. Attributed to renowned computer scientist Alan Kay, the analogy is used to illustrate the power of context in illuminating questions under investigation.
In a first for the field of artificial intelligence (AI), a tool called PINNACLE incorporates Kay’s insight when it comes to understanding the behavior of proteins in the appropriate context, as determined by the tissues and cells in which those proteins act and with which they interact. . In particular, PINNACLE overcomes some of the limitations of current AI models, which tend to analyze how proteins work and malfunction, but do so individually, one cell and tissue type at a time.
The development of the new AI model, described in Nature Methodsled by Harvard Medical School researchers.
The natural world is interconnected, and PINNACLE helps identify these connections, which we can use to gain more detailed knowledge about proteins and safer, more effective drugs. It overcomes the limitations of current framework-free models and suggests the future direction for enhancing analyzes of protein interactions.”
Marinka Zitnik, senior study author, assistant professor of biomedical informatics at the Blavatnik Institute at HMS
This advance, the researchers note, could advance our current understanding of the role of proteins in health and disease and illuminate new drug targets for designing more precise, better-tailored therapies.
PINNACLE is freely available to scientists everywhere.
An important step forward
Disentangling the interactions between proteins and the effects of their immediate biological neighbors is difficult. Current analytical tools serve a critical purpose by providing information on the structural properties and shapes of individual proteins. These tools, however, are not designed to address the contextual nuances of the overall protein environment. Instead, they produce context-independent protein representations, meaning they lack information about cell type and tissue type.
However, proteins play different roles in the different cellular and tissue contexts in which they are found and also depending on whether the same tissue or cell is healthy or diseased. Single-protein representation models cannot determine protein functions that vary in a multitude of contexts.
When it comes to protein behavior, it’s location, location, location
Composed of twenty different amino acids, proteins are the building blocks of cells and tissues and are essential for a range of life-sustaining biological functions. from carrying oxygen throughout the body to contracting muscles for breathing and walking to facilitating digestion and fighting infection, among many others.
Scientists estimate that the number of proteins in the human body ranges from 20,000 to hundreds of thousands.
Proteins interact with each other and with other molecules, such as DNA and RNA.
The complex interplay between and among proteins creates complex protein interaction networks. Located within and between other cells, these networks engage in many complex cross-talks with other proteins and protein networks.
PINNACLE’s advantage stems from its ability to recognize that protein behavior can vary by cell and tissue type. The same protein may have a different function in a healthy lung cell than in a healthy kidney cell or a diseased colon cell.
PINNACLE sheds light on how these cells and tissues affect the same proteins differently, which is not possible with current models. Depending on the specific cell type in which a protein network resides, PINNACLE can determine which proteins participate in specific conversations and which remain silent. This helps PINNACLE better decode protein crosstalk and type of behavior, and ultimately allows it to predict tightly tailored drug targets for dysfunctional proteins that cause disease.
PINNACLE does not bypass but complements single representation models, the researchers noted, as it can analyze protein interactions in various cellular contexts.
Thus, PINNACLE could enable researchers to better understand and predict protein function and help elucidate vital cellular processes and disease mechanisms.
This ability can help identify proteins that can be used as targets for individual drugs, as well as predict the effects of different drugs on different cell types. For this reason, PINNACLE could become a valuable tool for scientists and drug developers to identify potential targets much more efficiently.
Such optimization of the drug discovery process is absolutely necessary, said Zitnik, who is also an adjunct faculty member at the Kempner Institute for the Study of Physics and Artificial Intelligence at Harvard University.
It can take 10-15 years and cost up to a billion dollars to bring a new drug to market, and the path from discovery to drug is notoriously bumpy with the end result often unpredictable. Indeed, nearly 90 percent of drug candidates do not become drugs.
PINNACLE construction and training
Using human cell data from a comprehensive multi-organ atlas, combined with multiple protein-protein interaction networks, cell-to-cell and tissue-type interactions, the researchers trained PINNACLE to produce panoramic protein graphical representations spanning 156 cell types and 62 tissues and organs .
PINNACLE has generated nearly 395,000 multidimensional representations to date, compared to about 22,000 possible representations in current single-protein models. Each of its 156 cell types includes protein interaction networks rich in an environment of approximately 2,500 proteins.
The current number of cell types, tissues and organs are not the upper bounds of the model. The cell types evaluated to date are derived from living human donors and cover most, but not all, cell types of the human body. In addition, many cell types have yet to be identified, while others are rare or difficult to investigate, such as neurons in the brain.
To diversify PINNACLE’s cell repertoire, Zitnik plans to make use of a data platform that includes tens of millions of cells taken from the entire human body.
Source:
Journal Reference:
Lee, MM, et al. (2024). Context-based artificial intelligence models for single-cell protein biology. Nature Methods. doi.org/10.1038/s41592-024-02341-3