Trusting Synthetic Speech: Robustness, Fidelity, and the Risk of Bias in Hate Speech Transcription

Trusting Synthetic Speech: Robustness, Fidelity, and the Risk of Bias in Hate Speech Transcription

ACL ARR 2025 July Submission33 Authors

17 Jul 2025 (modified: 31 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The rise of synthetic speech audio-based NLP tasks has raised critical questions about the robustness, fidelity, and fairness. This study will empirically examine the relationship between Text-to-Speech (TTS) and Speech-to-Text (STT) models using hate and non-hate speech data. Our evaluation focuses on three key dimensions: (1) STT robustness, assessing the accuracy and gender sensitivity of STT models when transcribing synthetic versus human audio; (2) TTS synthetic audio fidelity, examining human-likeness and model preference through annotator evaluations and processing speed analysis; and (3) Impact on hate speech classification, quantifying how STT and TTS combinations affect downstream toxicity predictions. Our findings show that synthetic audio, especially from Microsoft Edge TTS, outperforms human audio in both transcription accuracy and consistency. WhisperX-Align (extended based on OpenAI’s Whisper model) emerges as the most robust STT model across tasks, although some systems exhibit notable gender and domain-specific biases. We recommend Microsoft Edge TTS as a high fidelity benchmark and SpeechT5 as a human proxy for perceptual evaluation, while highlighting the need for bias aware deployment in sensitive applications, such as hate speech detection. The implementation code is publicly available at https://anonymous.4open.science/r/Can-AI-Replace-Human-Speech-D0EF/.

Paper Type: Long

Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding

Research Area Keywords: automatic speech recognition, speech technologies, model bias/fairness evaluation,

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: english

Submission Number: 33

Loading