Detecting Replay Attack on Voice-Controlled Systems using Small Neural Networks

Nadeen Ahmed, Jowaria Khan, Nouran Sheta, Rahma Tarek, Imran A. Zualkernan, Fadi A. Aloul

Published: 2022, Last Modified: 16 May 2025RTSI 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Voice-control is becoming a common interface for many consumer IoT systems. Common threats to such systems include impersonation, replay, speech synthesis, and voice conversion attacks. Of these attacks, replay is the easiest to implement where a command is recorded and replayed. This paper explores the development of a lightweight intrusion detection neural network based on a recent command voice replay dataset. A lightweight model based on 1D Convolutional Neural Networks (CNN), and Long Short-Term Memory (LSTM) was proposed. The trained model was compared with baseline models based on Gaussian Mixture Models (GMM) using Constant Q Cepstral Coefficients (CQCC) and Mel-Frequency Cepstral Coefficient (MFCC). The proposed model outperformed the GMM models, and its size was significantly lower making it more feasible for embedded systems implementation.