Hidden-in-Wave: A Novel Idea to Camouflage AI-Synthesized Voices Based on Speaker-Irrelative Features
Abstract: Voice is an essential medium for human communication and collaboration, and its trustworthiness is of great importance to humans. Synthesizing fake voices and detecting synthesized voices are two sides of a coin. Both sides have made great strides with the recently prospering deep learning techniques. Attackers started using AI techniques to synthesize, even clone, human voices. Researchers also proposed a series of AI-synthesized voice detection approaches and achieved promising results in laboratory environments.In this paper, we introduced the concept of speaker-irrelative features (SiFs) and a novel detection-bypass idea to camouflage AI-synthesized voices: replacing SiFs of AI-synthesized voices with crafted ones. We implemented a proof-of-concept framework named SiF-DeepVC based on our detection-bypass idea. Experiments show that the existing detection systems would consider the voices output by SiF-DeepVC more human-like than human voices, proving our detection-bypass idea is effective and SiFs are noteworthy in camouflaging AI-synthesized voices.
Loading