Keywords: audio, deepfake, generative AI, detection
TL;DR: Deep fake audio detectors are not robust in real-world conditions if you apply marginal alterations like sound level, added noise, newer generative models, and transmissions effects.
Abstract: The misuse of generative AI (genAI) has raised significant ethical and trust issues. To mitigate this, substantial focus has been placed on detecting generated media, including fake audio. In this paper, we examine the efficacy of state-of-the-art fake audio detection methods under real-world conditions. By analyzing typical audio alterations of transmission pipelines, we identify several vulnerabilities: (1) minimal changes such as sound level variations can bias detection performance, (2) inevitable physical effects such as background noise lead to classifier failures, (3) classifiers struggle to generalize across different datasets, and (4) network degradation affects the overall detection performance. Our results indicate that existing detectors have major issues in differentiating between real and fake audio in practical applications and that significant improvements are still necessary for reliable detection in real-world environments.
Submission Number: 68
Loading