Abstract: There has been concerns about how speech data is collected and shared in the real world since human speech itself has personally identifiable information about the speaker and speech is available to reliably estimate speaker meta information. In this paper, we explore three different types of methods for DNN based speaker meta information estimation and compare the estimation results between the original speech and the anonymized speech. We used McAdam's coefficient-based signal processing technique to make the anonymized speech and privacy-preserving data. Experiments derived using TIMIT dataset show a slight degradation in performance of anonymized speech against the original. Experiments reveal that the model employing both DNN based embedding and voice anonymization can achieve comparable performance to the model using the original speech.
External IDs:dblp:conf/bigcomp/BaegHJ22
Loading