Abstract: The rapid advancement of artificial speech synthesis technologies, fueled by generative AI (GenAI), presents both opportunities and potential threats to society. While offering unprecedented opportunities, these technologies have been exploited to create "DeepFake" speech for fraud, impersonation, and spreading disinformation, as evidenced by recent real-world incidents. Our research aims to address such emerging threats by exploring a novel, proactive approach to disrupt unauthorized speech synthesis.Grounded in adversarial robustness theories, the core defense strategy is to embed imperceptible "voice cloaks" into users' speech. These perturbations are designed to prevent accurate voice cloning when used in unauthorized synthesis processes. This concept has been realized and validated in our preliminary work, AntiFake, demonstrating the initial feasibility. Building on these foundations, we propose a line of research that seeks to understand the fundamental three-way trade-off across protection generalizability, audio quality, and computational efficiency, and further achieve balanced improvements across these dimensions.
External IDs:dblp:conf/ccs/Yu24
Loading