- Abstract: The application of deep recurrent networks to audio transcription has led to impressive gains in automatic speech recognition (ASR) systems. Many have demonstrated that small adversarial perturbations can fool deep neural networks into incorrectly predicting a specified target with high confidence. Current work on fooling ASR systems have focused on white-box attacks, in which the model architecture and parameters are known. In this paper, we adopt a black-box approach to adversarial generation, combining the approaches of both genetic algorithms and gradient estimation to solve the task. We achieve a 89.25% targeted attack similarity after 3000 generations while maintaining 94.6% audio file similarity.
- Keywords: adversarial attack, adversarial examples, audio processing, speech to text, deep learning, adversarial audio, black box, machine learning
- TL;DR: We present a novel black-box targeted attack on speech to text systems that supports arbitrarily long adversarial transcriptions and achieves state of the art performance.