ISBench: Benchmarking Instruction-Following Capability and Safety of Large Speech-Language Models across Acoustic Conditions
Abstract: Recent advances in Large Speech-Language Models (LSLMs) demonstrate strong speech understanding and cross-modal interaction abilities. However, the lack of standardized evaluation methods hinders their development. Existing evaluation approaches face three limitations: (1) Inconsistent datasets prevent fair model comparisons; (2) Current benchmarks focus on specific speech tasks but fail to assess responses to direct speech instructions; (3) Critical aspects like security and robustness are overlooked. To address these issues, we propose ISBench, a benchmark for evaluating LSLMs' instruction-following capability and safety. Our framework introduces acoustic scenario simulations covering speaker characteristics (gender/age/emotion), environmental factors (background noise), and linguistic variations (colloquial expressions). Through comprehensive experiments with seven open-source models, we reveal key findings: LSLMs show performance gaps between speech and text modalities, exhibit weaker performance with children's voices, and demonstrate significant sensitivity to noise and informal language. ISBench provides researchers with a unified evaluation platform to advance LSLM development.
Paper Type: Short
Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding
Research Area Keywords: Benchmark; Large Speech-Language Models
Contribution Types: Data resources
Languages Studied: English
Submission Number: 7235
Loading