ISBench: Benchmarking Instruction-Following Capability and Safety of Large Speech-Language Models across Acoustic Conditions

ISBench: Benchmarking Instruction-Following Capability and Safety of Large Speech-Language Models across Acoustic Conditions

ACL ARR 2025 February Submission7235 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advances in Large Speech-Language Models (LSLMs) demonstrate strong speech understanding and cross-modal interaction abilities. However, the lack of standardized evaluation methods hinders their development. Existing evaluation approaches face three limitations: (1) Inconsistent datasets prevent fair model comparisons; (2) Current benchmarks focus on specific speech tasks but fail to assess responses to direct speech instructions; (3) Critical aspects like security and robustness are overlooked. To address these issues, we propose ISBench, a benchmark for evaluating LSLMs' instruction-following capability and safety. Our framework introduces acoustic scenario simulations covering speaker characteristics (gender/age/emotion), environmental factors (background noise), and linguistic variations (colloquial expressions). Through comprehensive experiments with seven open-source models, we reveal key findings: LSLMs show performance gaps between speech and text modalities, exhibit weaker performance with children's voices, and demonstrate significant sensitivity to noise and informal language. ISBench provides researchers with a unified evaluation platform to advance LSLM development.

Paper Type: Short

Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding

Research Area Keywords: Benchmark; Large Speech-Language Models

Contribution Types: Data resources

Languages Studied: English

Submission Number: 7235

Loading