Perceptually Constrained Fast Adversarial Audio Attacks

Jason Henry; Mehmet Ergezer; Marko Orescanin

Perceptually Constrained Fast Adversarial Audio Attacks

Jason Henry, Mehmet Ergezer, Marko Orescanin

Published: 01 Jan 2021, Last Modified: 28 Jul 2025ICMLA 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Audio adversarial attacks on deep learning models are of great interest given the commercial success and proliferation of these technologies. These types of attacks have been successfully demonstrated, however, artifacts introduced in the adversarial audio are easily detectable by a human observer. In this work, an expansion of the fast audio adversarial perturbation framework is proposed that can produce an adversarial attack that is imperceptible to a human observer in near-real time using black-box attacks. This is achieved by proposing a perceptually motivated penalty function. We propose a perceptual fast audio adversarial perturbation generator (PFAPG) that employs a loudness constrained loss function, in lieu of a conventional L-2 norm, between the adversarial example and original audio signal. We compare the performance of PFAPG against the conventional constraint based on the MSE on three audio recognition datasets: speaker recognition, speech command, and the Ryerson audiovisual database of emotional speech and song. Our results indicate that, on average, PFAPG equipped with the loudness-constrained loss function yields a 11% higher success rate, while reducing the undesirable distortion artifacts in adversarial audio by 10% dB compared to the prevalent MSE constraints.

Loading