Abstract: Research on deepfake techniques for speech is crucial for combatting the spread of fake information, safeguarding public privacy, and advancing forensic techniques. However, the lack of transparency and explainability of spoofed speech detection models raises concerns about their reliability. In this paper, we suggest using raw waveform signals and spectrograms as fused features of the spoofed speech detection model. We use the SHAP method to analyze the feature distribution of spoofed speech detection and explain the likelihood of fake speech. Our experimental results demonstrate that our approach achieves better classification results with lighter model parameters than other feature fusion methods. Finally, the feature contribution values are calculated under the SHAP method to visualize them as heat maps. It helps researchers to analyze the feature distribution of spoofed speech to identify the most critical features that distinguish between spoofed and bona fide and to ensure transparency in their use.
Loading