Exploring Acoustic Reverse Nonlinearity Against Speech Forgery in Real-Time Voice Applications

Ming Gao, Lingfeng Zhang, Yike Chen, Sifeng He, Feng Qian, Lei Yang, Fu Xiao, Jinsong Han

Published: 01 Jan 2025, Last Modified: 21 Sept 2025INFOCOM 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Unauthorized editing of speech recordings poses a significant threat to the security and authenticity of speeches, particularly in the forensic and legal fields. Even worse, the speech is increasingly at risk of being tampered with due to the development of AI techniques (e.g., Audio Deepfake). It is difficult for normal users to guarantee what they say has not been illegally changed. Audio watermark techniques are recognized as an active method against speech forgery. However, such techniques suffer from audio quality degradation and non-real-time insertion. Therefore, they cannot be adopted into real-time voice applications against forgery on remote recordings, e.g., phone calls, live broadcasts, and online meetings. Fortunately, high-definition (HD) audio techniques provide ultrasonic bands without distortion. Therefore, ultrasonic creditable factors can be utilized. We propose an audio tamper-proof system, named Aegis. It provides commodity mobile devices (e.g., smartphones) with an effective method of real-time insertion of inaudible creditable factors. Users can claim that audio with no or mismatched ultrasound is invalid and illegal. In particular, we explore the acoustic reverse-nonlinear phenomenon where audible signals can be modulated onto the ultrasonic spectrum. By emphasizing the correlation between speech signals and ultrasound, we realize effective defense against various tampering methods.

External IDs:dblp:conf/infocom/GaoZCHQYXH25