SpeechHide: A Hybrid Privacy-preserving Mechanism for Speech Content and Voiceprint in Speech Data SharingDownload PDFOpen Website

Published: 01 Jan 2022, Last Modified: 16 May 2023DSC 2022Readers: Everyone
Abstract: With the development of speech technology, huge amounts of speech data generated by users is collected by speech service providers and may be used for data sharing. However, speech data contains users’ private information. An attacker may use speaker recognition to identify the target user’s speech data and then analyze the speech content in it, thus causing harm to the user’s privacy. In this paper, we propose a privacy-preserving mechanism for speech data, aiming to protect users’ speech privacy from two dimensions: speech content and voiceprint. Specifically, we use named entity recognition and KeyBERT keyword extraction algorithms to identify words in speech content that may contain private information, and then replace them with secure words, thus protecting the privacy of users’ speech content. We design a voiceprint anonymization method based on differential privacy, which perturbs the voiceprint of the source speaker into the voiceprint of another speaker to prevent the leakage of the user’s voiceprint. Meanwhile, we are able to ensure that the sanitized and anonymized speech data has good sound quality and data utility, which is measured by the accuracy of the speech recognition performed on it. Experiments conducted on public datasets validate the effectiveness of our proposed scheme, and the experimental results show that our scheme reduces the accuracy of speaker recognition by 100% and the accuracy of speech recognition by 14.3%.
0 Replies

Loading