By merging causal genetics with network control theory, this study reveals hidden drivers of long-term COVID, offering new insight into why the condition affects patients so differently.
Study: Integrated multi-homics framework for causal gene discovery in Long COVID. Image credit: Daisy Daisy/Shutterstock.com
The coronavirus disease 2019 (COVID-19) pandemic has had a heavy impact on human life and health starting in 2020. Although the severity of the pandemic has faded, its long-term consequences continue to afflict hundreds of thousands of survivors.
A recent study published in the journal PLoS Computational Biology examines genes underlying risk for long-term COVID using multi-omics tools.
Long-term COVID affects millions with a variety of lingering symptoms
Post-acute sequelae of SARS-CoV-2 infection (PASC), also known as prolonged COVID, refers to persistent or new symptoms that occur after infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It affects up to 20% of people who get this infection, even in a subclinical form.
However, the reported prevalence varies due to different definitions issued by various organizations, including the World Health Organization (WHO) and the National Institute for Health and Care Excellence (NICE).
Long-term symptoms of COVID include neurological (brain fog, headache, memory problems), respiratory (difficulty breathing, chest tightness, reduced ability to exercise), musculoskeletal (persistent severe fatigue, myalgia, joint pain), cardiovascular (chest pain, rapid heartbeat, fluctuating blood pressure), and minimal inflammatory symptoms.
Known risk factors for long-term COVID include sex, age, and the presence of pre-existing disease. However, the genetic underpinnings are unclear, which motivates the current study. Such insights would help develop more accurate diagnostics and inform future personalized treatments for this widespread condition.
Multi-omics data fuel a new causative gene framework
The current study used a custom multi-omics platform that combines two analytical methods: one to identify potential genes associated with long COVID and the other to identify network “driver” genes that exert control over disease-related biological pathways.
The computational platform included multiple types of biological data and mathematical methods that together form a comprehensive framework for analyzing the genetic causes of long-term COVID.
The methods used in this integrated approach included:
- Mendelian randomization (TWMR) to find genes with evidence of causal effects in long-term risk or protection from COVID
- Quantitative Expression Trait Loci (eQTLs) to examine genetic variants for their influence on gene expression
- Genome-wide association studies (GWAS) to identify associations between genetic variants and long-term risk of COVID
- RNA sequencing (RNA-seq) to study real changes in gene expression in long COVID
- The human protein-protein interaction (PPI) network that explores how proteins interact and identifies key regulatory checkpoints using network control theory
The authors combined these to form a combined score for each gene:
Final score=α⋅(TWMR score)+(1−α)⋅(CT score)
Where the parameter α allows users to balance the contribution of direct causation against network controllability.
The study prioritizes 32 genes associated with long-lasting COVID
The study identified 32 candidate genes that are likely to cause long-lasting COVID. Of these, 19 have been reported by previous researchers, supporting the current study. Meanwhile, 13 were identified for the first time and need further study. This set of genes is involved in the host’s response to the virus, the ability of the virus to induce cancerous changes in cells, and the regulation of the host’s immune response and the cell cycle.
Using enrichment analyses, it became clear that the same set of genes is involved in long-term COVID, as in autoimmune and connective tissue disorders, as well as in certain syndromes and metabolic conditions. This explains why the former presents with such different symptoms.
The scientists sorted the causative genes based on their expression profiles to identify three subtypes of long COVID. These had different symptoms, different underlying disease pathways and different clinical features.
The researchers developed a free, open source application in the Shiny framework to allow other users to study, search, and analyze their data freely, using their own filters and parameters. This can be used to generate lists of putative causal genes using either Mendelian randomization or control theory. It also helps to replicate the findings of the current study.
Combining causality and network biology enhances discovery
Strengths of this study include combining causal inference via MR with network control theory, thus capturing both the direct effects of causal gene expression and the effects of perturbations on system-wide checkpoints. Second, the use of multi-omics data makes it superior to a study based on a single type of data.
In addition, gene discovery was accompanied by the identification of disease subtypes, making it clinically relevant, and the development of an interactive user tool. The Shiny app allows users to find more data by specifying how much focus they want on either direct causal genes or the effect of regulatory control on the network.
Goals for future diagnoses and treatments
“This comprehensive framework highlights novel causal mechanisms and therapeutic targets, promoting precision medicine strategies for Long COVID“, the authors conclude, while stressing that these findings form the basis for future research.
