Keywords: Automatic Speaker Verification (ASV), Deep Neural Networks(DNN), SV2TTS, Resemblyzer, Mean Opinion Score(MOS), Equal Error Rate(EER)
TL;DR: Paper examines the impact of deepfake audio with African accents on ASV systems, using SV2TTS as the synthesis model and the Resemblyzer as the ASV system.
Abstract: Automatic Speaker Verification (ASV) systems are vital for seamless authentication in digital systems using speech. However, the rise of deep neural network (DNN)-based voice synthesis has introduced the risk of deepfake audios that convincingly mimic human voices. This poses a significant threat to both individual identities and ASV system security. To address this, an extensive study examined the impact of deepfake audio with African accents on ASV systems. The findings reveal that modern ASV systems, like the Resemblyzer, are less susceptible to deception by deepfake audio with African accents. These results highlight the need for developing deepfake audio systems that accurately simulate authentic African accents, enabling effective technology utilization in addressing modern challenges in Africa.
Submission Category: Machine learning algorithms
Submission Number: 13
Loading