A group of researchers led by Elke Rundensteiner has developed a highly effective technology that screens voice recordings for signs that a speaker is depressed, an important advance that could alert physicians and other clinicians to people who need help.
Audio-assisted Bidirectional Encoder Representations from Transformers (AudiBERT), the system developed by the researchers, leverages the words a speaker uses as well as the speaker’s tone, says Rundensteiner, William Smith Dean's Professor of Computer Science and founding director of WPI’s Data Science Program.
“Clinicians can detect depression and other mental ailments based on the content and tone of interviews with patients,” Rundensteiner says. “With deep learning data science techniques, we have developed a digital technology that examines a speaker’s words and tone for signs of depression. If widely deployed, this tool could dramatically expand mental health screening at low costs.”
The researchers’ innovation was selected for presentation in November 2021 at the Association for Computing Machinery Conference on Information and Knowledge Management, where it received the Best Applied Research Award. The authors are Rundensteiner; Ermal Toto ’21 (PhD), previously a graduate student in computer science with Rundensteiner and now WPI assistant director of academic research computing; and ML Tlachac, a PhD student in data science with Rundensteiner. Tlachac has accepted a position as an assistant professor at Bryant University.
AudiBERT builds on the researchers’ previous work on the feasibility of using machine learning to analyze voice samples and other digital data from smartphones and social media and on audio-based depression screening as a way to address the societal problems of depression and limited mental health resources. At the core of the research is the idea that a person’s voice can reveal hidden issues.
“If a person is depressed, their vocal tone becomes a monotone,” Toto says. “Their voice might jitter, or shake, a little bit. “Trained clinicians can intuitively detect these variables during conversations. Now we can automate the detection in the human voice through machine learning models.”