Abstract: Detecting the presence of the electric network frequency (ENF) signal in audio recordings is a prerequisite of applying the ENF criterion that plays an essential role in numerous forensic applications. However, existing detection methods are powerless to handle short-duration audio recordings that have attracted considerable attention due to the popularity of voice messaging apps. This paper proposes a novel deep learning-based approach for ENF detection in short audio recordings, reducing the minimum operating range of audio duration to 1/10 of the state-of-the-art methods. Meanwhile, a convolutional attention network termed Conv-AttNet is proposed to improve the detection performance of convolutional neural networks (CNN) through the attention mechanism. Experiments on both synthetic and real-world audio recordings reveal that Conv-AttNet is able to detect the ENF signal buried in only 2 seconds of audio recordings, surpassing both matched filtering and typical CNN like ResNet50. In addition, the detection accuracy can be further increased by utilizing audio recordings of longer duration.
Loading