Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation
Abstract: Acquiring speech data is a crucial step in the development of speech recognition systems and related speech-based machine learning models. However, protecting privacy is an increasing concern that must be addressed. This study investigates voice conversion (VC) as a strategy for anonymizing the speech of individuals with dysarthria. We specifically focus on training a variety of VC models using self-supervised speech representations, such as Wav2Vec and its multi-lingual variant, Wav2Vec2.0 (XLSR). The converted voices maintain a word error rate that is within 1% with respect to the original recordings. The Equal Error Rate (EER) showed a significant increase, from 1.52% to 41.18% on the LibriSpeech test set, and from 3.75% to 42.19% on speakers from the VCTK corpus, indicating a substantial decrease in speaker verification performance. A similar trend is observed with dysarthric speech, where the EER varied from 16.45% to 43.46%. Additionally, our study includes classification experiments on dysarthric vs. healthy speech data to demonstrate that anonymized voices can still yield speech features essential for distinguishing between healthy and pathological speech. The impact of voice conversion is investigated by covering aspects such as articulation, prosody, phonation, and phonology.
Loading