When depth is redundant: Efficient transformer-based speech anti-spoofing

When depth is redundant: Efficient transformer-based speech anti-spoofing

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: anti-spoofing, transformer, similarity, analysis

Abstract: Detecting speech deepfakes is critical for protecting society against fraud, identity theft, and the misuse of modern speech synthesis technologies. Despite recent progress, existing countermeasures often exhibit limited generalization to unseen spoofing attacks, particularly in out-of-domain evaluation settings, even when achieving strong in-domain performance. Transformer architectures have become ubiquitous in anti-spoofing, serving both as feature extractors (e.g., \textit{wav2vec~2.0}) and as classifiers. However, deep transformer stacks exhibit substantial representational redundancy across adjacent layers, with similarity increasing toward deeper layers. As a result, task-specific specialization is largely concentrated in the final layers, while shallow layers remain underutilized during fine-tuning. In this work, we analyze the layer-wise behavior of transformer-based classifiers for speech deepfake detection and propose a training strategy that explicitly aligns shallow and intermediate representations with those of the final transformer layer. By encouraging all layers to mimic the task-specialized representation learned at depth, the model more effectively exploits early-layer features while preserving discriminative capacity in deeper layers. This design improves robustness to unseen spoofing attacks and enhances out-of-domain generalization. Extensive experiments across multiple benchmark datasets demonstrate consistent performance gains over strong baselines.

Paper Type: Long

Research Area: Speech Processing and Spoken Language Understanding

Research Area Keywords: Speech Recognition, Text-to-Speech and Spoken Language Understanding

Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 10775

Loading