Backdoor Attacks Against Speech Language Models

Alexandrine Fortier; Thomas Thebaud; Jesus Villalba; Najim Dehak; Patrick Cardinal

Backdoor Attacks Against Speech Language Models

Alexandrine Fortier, Thomas Thebaud, Jesus Villalba, Najim Dehak, Patrick Cardinal

15 Sept 2025 (modified: 03 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Speech Language Model, Backdoor Attacks, LLM, Robustness

TL;DR: We present the first systematic study of audio backdoor attacks against speech language models, showing that the audio encoder is the central vulnerability, ASR is more resistant than other tasks, and fine-tuning can mitigate attacks.

Abstract: Large Language Models (LLMs) and their multimodal extensions are becoming increasingly popular. One common approach to enable multimodality is to cascade domain-specific encoders with an LLM, making the resulting model inherit vulnerabilities from all of its components. In this work, we present the first systematic study of audio backdoor attacks against speech language models. We demonstrate its effectiveness across four speech encoders and three datasets, covering four tasks: automatic speech recognition (ASR), speech emotion recognition, and gender and age prediction. The attack consistently achieves high success rates, ranging from 90.76\% to 99.41\%. To better understand how backdoors propagate, we conduct a component-wise analysis to identify the most vulnerable stages of the pipeline. Finally, we propose a fine-tuning-based defense that mitigates the threat of poisoned pretrained encoders.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 6394

Loading