An explainable deepfake of speech detection method with spectrograms and waveforms

Ning Yu, Long Chen, Tao Leng, Zigang Chen, Xiaoyin Yi

Published: 2024, Last Modified: 12 Jul 2025J. Inf. Secur. Appl. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Research on deepfake techniques for speech is crucial for combatting the spread of fake information, safeguarding public privacy, and advancing forensic techniques. However, the lack of transparency and explainability of spoofed speech detection models raises concerns about their reliability. In this paper, we suggest using raw waveform signals and spectrograms as fused features of the spoofed speech detection model. We use the SHAP method to analyze the feature distribution of spoofed speech detection and explain the likelihood of fake speech. Our experimental results demonstrate that our approach achieves better classification results with lighter model parameters than other feature fusion methods. Finally, the feature contribution values are calculated under the SHAP method to visualize them as heat maps. It helps researchers to analyze the feature distribution of spoofed speech to identify the most critical features that distinguish between spoofed and bona fide and to ensure transparency in their use.