Audio Adversarial Example With No Noise in the Silent Area for Speech Recognition System

Kyoungmin Ko, Sunghwan Kim, Hyun Kwon

Published: 2026, Last Modified: 25 Jan 2026IEEE Access 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we propose a specific type of adversarial example which is recognized appropriately by humans and is misrecognized by the target model and is generated by adding adversarial noise only to that part of the original audio sample that contains an audio signal. The method adds noise only to the part of the audio containing a voice signal in such a way that human can identify the resulting sample correctly but the target model will misclassify. The proposed method is distinguished from existing methods for generating adversarial examples in that it adds adversarial noise only to that portion of the audio file in which an audio signal appears; this distinction gives it the advantage that in the resulting sample, it is nearly impossible for a human to identify the adversarial noise. Our experiments used the Mozilla Common Voice dataset, with a Deep Speech model. The results demonstrate that the proposed method of generating adversarial examples achieved a success rate of 91.4% in attacking the target model, while keeping the average distortion from the original sample to a minimum value of 232.20.

External IDs:dblp:journals/access/KoKK26