• Research Highlights
Smart watches, smartphones, and other wearable devices are transforming the way we track our physical health and behavior. Researchers are also investigating whether these devices could provide insights into our mental health, with the aim of developing AI tools that can help identify when people need mental health support or professional care. However, research supported by the National Institute of Mental Health suggests that AI tools based on smartphone data may struggle to accurately predict clinical outcomes such as depression in large and diverse groups of people.
What did the researchers do?
Lead author Daniel Adler of Cornell University and colleagues from Northwestern University Feinberg School of Medicine, Weill Cornell Medicine and Michigan Medicine analyzed behavioral data from 650 people, which was collected via their smartphones. While the study was larger and more diverse than previous studies, participants were mostly female, white, middle- to high-income, and between the ages of 25 and 54.
Smartphone data included behavioral measures related to mobility, phone use, and sleep. Participants also completed the PHQ-8, a standard self-report measure of depressive symptoms.
Building on recent studies, the researchers developed artificial intelligence models that analyzed the smartphone data to create a depression risk score for each participant, indicating the likelihood of clinically significant depression. The researchers then assessed the reliability of the models by identifying age, race, gender, and socioeconomic subgroups for which the model’s predictions were less accurate.
What did the researchers find?
Overall, the best-performing AI model was shown to be only moderately accurate at predicting who had clinically significant depression (as measured by the PHQ-8). While the model identified some patterns, it consistently underperformed for certain groups of people. For example, the researchers found that the model was biased toward identifying people at higher risk for depression if they were older, female, black or African-American, low-income, unemployed, or disabled. On the other hand, the model was skewed toward identifying people who had a lower risk of depression if they were younger, male, white, high-income, insured, or employed.
To better understand these results, the researchers looked at how the AI model correlated different behaviors with depression risk.
For example, the AI model predicted that higher phone use in the morning was generally associated with a lower risk of depression. However, when the researchers examined the data, they found that this association did not hold across all age subgroups. While higher morning phone use was associated with a lower risk of depression for young adults (ages 18 to 25), it was associated with a higher risk for older adults (ages 65 to 74).
The AI tool also predicted that measures of increased mobility, as captured by GPS, were generally associated with a lower risk of depression. However, the underlying data showed that these associations did not hold across all income-related subgroups. For people who came from low-income households, who were disabled, and who were uninsured, greater mobility was associated with a higher risk of depression.
What do the findings mean?
The findings highlight the challenges of using AI models based on smartphone data to predict mental health outcomes in a large, diverse group of people. When associations between people’s behavioral patterns and their mental health outcomes vary across demographic groups, AI models may be more likely to make incorrect predictions for some of these groups, leading to skewed results.
According to the researchers, the results highlight the importance of developing AI tools using data from people whose behavioral patterns are similar to those of the target population. One way to increase the effectiveness of AI models may be to develop predictive models that focus on smaller, more targeted populations.
The researchers note that their study focused on associations between behaviors and depression risk across individuals. It is possible that personalized models—models based on behavioral data from an individual over time—may be able to more accurately predict individual depression risk.
Reference
Adler, DA, Stamatis, CA, Meyerhoff, J., Mohr, DC, Wang, F., Aranovich, GJ, Sen, S., & Choudhury, T. (2024). Measuring algorithmic bias to analyze the reliability of artificial intelligence tools predicting depression risk using smartphone aesthetic behavior data. npj Mental Health Research, 3(17). https://doi.org/10.1038/s44184-024-00057-y