Led by Helmholtz Munich, the scientists developed an accessible software solution specifically designed for the analysis of complex medical health data. Open source software called “ehrapy” allows researchers to systematically structure and examine large, heterogeneous datasets. The software is available to the global scientific community for use and further development.
Ehrapy is intended to fill a critical gap in the analysis of health data, says Lukas Heumos, one of the lead developers and a scientist at the Institute for Computational Biology at Helmholtz Munich and the Technical University of Munich (TUM): “Until now, there are no standardized tools for the systematic and efficient analysis of diverse and complex medical data. The team behind ehrapy comes from a biomedical research background and has extensive experience in analyzing complex scientific data. “The healthcare sector faces similar challenges in data analysis as those working in laboratories,” Heumos noted at the start of the ehrapy project.
Exploratory approach – analysis without hypotheses
Along with many other contributors, Heumos used his expertise in scientific software development to create a solution for analyzing patient data: “Ehrapy can uncover new patterns and generate insights without having to analyze the data based on a specific case or case.” This exploratory approach, says Heumos, is a unique feature of ehrapy.
Ehrapy enables researchers to classify, cluster and analyze large, heterogeneous and complex datasets without pre-existing assumptions. This opens up new information that can then be explored further. Heumos explains: “The exploratory approach brings new perspectives to the analysis of health data. Due to its complexity and heterogeneity, these data are often not analyzed as efficiently as they could be.” Ehrapy thus opens new avenues to make health data more useful for medical research and practice.
The long-term goal: Routine use in clinical practice
Ehrapy was designed from the ground up as open source software. “It was important for us to make the software available to the scientific community from day one,” emphasizes Heumos. The software is available as a Python package on GitHub, an online platform for software development, and can be used and further developed by researchers around the world.
Currently, ehrapy focuses on efficient and fast analysis of research data sets, such as those stored in large health research centers. “Routine use in clinical practice is a long-term goal, but for now, we’re focused on providing the research community with a powerful tool,” says Heumos.
In the future, the group plans to provide standardized databases for electronic health records (EHRs). These databases will enable better integration and analysis of large volumes of medical data. Additionally, this will facilitate the development of EHR atlases that can serve as reference datasets for contextualizing and annotating new datasets.
A long journey
“Ehrapy enables comprehensive data analysis across systems, which can be a key step for future AI systems in medicine. So I hope for a relatively quick adoption across various sites,” says Prof. Fabian Theis, Director of the Institute for Computational Biology at Helmholtz Munich and professor at TUM: “The introduction of such technologies in medicine is a long process that can take decades. Our goal is to bridge the gap between biomedical research and practical application in medicine.” Theis further explains that the development team is focusing on exploratory data analysis methods in a holistic format to more easily uncover hidden connections. “We also try to support academic and commercial players in the health sector.”
Source:
Journal Reference:
Heumos, L., et al. (2024). An open source framework for end-to-end analysis of electronic health record data. Nature Medicine. doi.org/10.1038/s41591-024-03214-0.