Important mental health history is often present in medical records but is difficult to find, especially when it is missing the diagnosis codes that clinicians, researchers, and health systems use to search for and count conditions.
A new study led by researchers at the University of New Mexico School of Medicine analyzed electronic health records for more than 1.3 million patients served by the Veterans Health Administration (VHA). Highlighting a common gap in how health systems track self-harm, the researchers found that diagnosis codes captured only about a quarter of a clinically documented history of self-harm.
“For research and planning, if we only measure what is easy to see in diagnosis codes, we may substantially underestimate the need for mental health services,” said Christophe Lambert, PhD, professor and interim director of the Division of Translational Informatics in the Department of Internal Medicine at the UNM School of Medicine, and the study’s corresponding author. “Better measurement can help health systems plan better, help researchers study care more accurately, and ultimately help clinicians know when a patient might need a closer look.”
The study, published in Journal of Medical Internet Research, used a new machine learning method previously developed by members of the research team. After expert chart review and statistical calibration, the researchers estimated that documented self-injury was present in about 7.9% of patients seen by VHA clinicians – more than four times the 1.85% visible through diagnosis codes alone. The gap matters because lost history can affect clinical awareness, research findings, and planning of mental health services.
Problem lists – the notes providers make of their patients’ health conditions – showed another visibility gap. They are intended to highlight important conditions for clinical teams, but in actual care they are not always complete or consistently maintained. Among veterans with a diagnosis code for self-injury, 22.6% had self-injury or a history of self-injury on the VHA problem list. This means that even when self-harm appeared in the diagnosis codes, it was often missing from one of the more visible summary fields of the record.
Past self-injury is clinically important because it is one of the most important predictors of future self-injury and suicide risk. It can also shape how care is delivered, including how clinicians think about depression, PTSD, bipolar disorder, substance use, traumatic brain injury, and other conditions that can co-occur with self-injury.
The authors note that the VHA already uses specialized suicide and overdose reporting tools and does not rely solely on diagnosis codes or problem lists to track suicide risk. This study addressed a different but related question: How much past history of self-injury is visible in the parts of the record that researchers, care teams, and health systems can most easily quantify and review at scale?
“This is a system-wide visibility issue,” Lambert said. “The file can be huge. In our chart review, some patient records had more than 500,000 lines of notes. No clinician can be expected to read all of that during a normal visit.”
The study did not attempt to predict future self-harm or determine with certainty whether any patient had self-harmed. Instead, the team looked at whether a computer model could use patterns in structured electronic health record data to estimate the probability that a history of self-harm was present but missing from diagnosis codes, and then compare those probabilities with expert review of clinical notes.
To do this, the team used a method called PULSNAR – Positive unlabeled learning chosen not randomly, built for messy real-world health data. Most machine learning methods need clear examples of “yes” and “no” cases. But in medical records, a missing diagnosis code does not prove that a patient never had the condition.
PULSNAR works with this uncertainty. It learns from patients who have a code and then calculates how many similar patients there might be among those who don’t have a code. Its main advantage is that it does not assume that coded cases are random and allows for the fact that some cases are more likely to be coded than others.
“Medical records can make self-injury hard to see in more ways than one,” said Praveen Kumar, PhD, the study’s first author. “Sometimes the history is found in a clinician’s note but not in the diagnosis codes. Other times, the record may contain risk factors, injuries, poisonings, or behaviors consistent with self-injury, even though the record alone does not prove what happened or why.”
“Our method can help flag both patterns for testing. This study could verify the first pattern because the evidence was already present in the notes. The second pattern may be just as important, but confirming it would require talking to patients or using information beyond the medical record.”
The research team included experts from the UNM Health Sciences Center, Raymond G. Murphy Veterans Affairs (VA) Medical Center, Vanderbilt University Medical Center, VA Tennessee Valley Health System, VA Office of Mental Health, Greer Black Company and UNM’s Department of Economics. The team brought together expertise in medical informatics, computer science, psychiatry, biomedical informatics, economics, statistics and health services research.
The self-injury study is part of a larger research project that uses positivity and unlabeled learning to find conditions that may be underreported in standard medical data, the researchers said. The team has already published a related study using this approach detecting undercoded opioid use disorderand ongoing work is expanding it to other conditions where the medical record may not show the full picture, including undiagnosed PTSD, depression, bipolar disorder, and sleep disorders.
The method could complement broader VHA mental health and suicide prevention efforts by adding a scalable way to measure conditions that may be underreported or difficult to see in standard medical data. The researchers stressed that the method is still a research tool and not ready to be used on its own in clinical care, although with further development, it could help health systems better assess underreported mental health conditions, find documented history that is not clearly visible, and identify records that may warrant closer examination.
“The history of self-injury is too important to be buried in records that are not practical to review line by line during routine care,” Lambert said. “Our job is to help researchers and health systems find documented history and clinically relevant patterns in data so that care teams have a more complete picture of the people they serve.”
