Abstract
INTRODUCTION
Digital voice analysis is an emerging tool for differentiating cognitive states, but it poses privacy risks as automated systems may inadvertently identify speakers.
METHODS
We developed a computational framework to evaluate the trade-off between voice obfuscation and cognitive assessment accuracy, using pitch-shifting as a representative method. This framework was applied to voice recordings from the Framingham Heart Study (FHS, n = 128) and the DementiaBank Delaware (DBD, n = 85) corpus, both featuring responses to neuropsychological tests. Speaker obfuscation was measured via equal error rate (EER), and diagnostic utility was assessed through machine learning models distinguishing cognitive states: normal cognition (NC), mild cognitive impairment (MCI), and dementia (DE).
RESULTS
With the top 20 acoustic features, our framework achieved classification accuracies of 62.2% (EER: 0.3335) on the FHS dataset for NC, MCI, and DE differentiation, and 63.7% (EER: 0.1796) on the DBD dataset for NC and MCI differentiation, using obfuscated speech files.
DISCUSSION
Our results demonstrate the feasibility of privacy-preserving voice markers, offering a scalable solution for voice-based cognitive assessments.
Highlights
We developed a computational framework using pitch-shifting and acoustic transformations to balance speaker privacy and diagnostic utility in voice-based cognitive assessments.
We evaluated the framework on two independent datasets, Framingham Heart Study (FHS, n = 128) and DementiaBank Delaware (DBD, n = 85) corpus, assessing the trade-off between privacy (measured by equal error rate [EER]) and classification accuracy.
Our framework achieved classification accuracies of 62.2% (EER: 0.3335) for distinguishing normal cognition (NC), mild cognitive impairment (MCI), and dementia in the FHS dataset and 63.7% (EER: 0.1796) for NC and MCI differentiation in the DBD dataset, using obfuscated speech files.
Our framework demonstrates that pitch-shifting levels can preserve diagnostic utility while protecting speaker identity, offering a scalable and privacy-preserving solution.
If you do not see content above, kindly GO TO SOURCE.
Not all publishers encode content in a way that enables republishing at Neuro.vip.
This post is Copyright: Meysam Ahangaran,
Nauman Dawalatabad,
Cody Karjadi,
James Glass,
Rhoda Au,
Vijaya B. Kolachalama | March 14, 2025