Abstract: Acoustic waves are forms of energy that propagate through various mediums. They can be represented by different modalities, such as auditory signals and visual patterns. The two modalities are often described as one-dimensional waveform in the time domain and two-dimensional spectrogram in the frequency domain. Most acoustic signal processing methods use single modal data for input and training models. This poses a challenge for black-box adversarial attacks on audio signals because the input modality is also unknown to the attacker. In fact, there currently exist no methods that explore the cross-modal transferability of adversarial perturbation. This paper investigates the cross-modal transferability from waveform to spectrogram. We argue that the data distributions in the sample space with the different modalities have mapping relations and propose a novel decision-based cross-modal and cross-medium adversarial attack method. Specifically, it generates an initial example with cross-modal attack capability by combining random natural noise, then iteratively reduces the perturbation to enhance its invisibility. It incorporates the constraints of the spectrogram sample space while iteratively optimizing adversarial perturbations for black-box audio classification models. The perturbation is imperceptible to humans, both visually and aurally. Extensive experiments demonstrate that our approach can launch attacks on classification models for sound waves and spectrograms that share the same audio signal. Furthermore, we explore the cross-medium capability of our proposed adversarial attack strategy that can target processing models for acoustic signals propagating in air and seawater. The proposed method has preeminent invisibility and generalization compared to other methods.
0 Replies
Loading