AHa-Bench: Benchmarking Audio Hallucinations in Large Audio-Language Models

Xize Cheng; Dongjie Fu; Chenyuhao Wen; Shannon Yu; Zehan Wang; Shengpeng Ji; Siddhant Arora; Tao Jin; Shinji Watanabe; Zhou Zhao

AHa-Bench: Benchmarking Audio Hallucinations in Large Audio-Language Models

Xize Cheng, Dongjie Fu, Chenyuhao Wen, Shannon Yu, Zehan Wang, Shengpeng Ji, Siddhant Arora, Tao Jin, Shinji Watanabe, Zhou Zhao

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Audio Hallucination, Large Audio Language Model, Benchmark

Abstract: Hallucinations present a significant challenge in the development and evaluation of large language models (LLMs), directly affecting their reliability and accuracy. While notable advancements have been made in research on textual and visual hallucinations, there is still a lack of a comprehensive benchmark for evaluating auditory hallucinations in large audio language models (LALMs). To fill this gap, we introduce **AHa-Bench**, a systematic and comprehensive benchmark for audio hallucinations. Audio data, in particular, uniquely combines the multi-attribute complexity of visual data with the semantic richness of textual data, leading to auditory hallucinations that share characteristics with both visual and textual hallucinations. Based on the source of these hallucinations, AHa-Bench categorizes them into semantic hallucinations, acoustic hallucinations, and semantic-acoustic confusion hallucinations. In addition, we systematically evaluate seven open-source local perception language models (LALMs), demonstrating the challenges these models face in audio understanding, especially when it comes to jointly understanding semantic and acoustic information. Through the development of a comprehensive evaluation framework, AHa-Bench aims to enhance the robustness and stability of LALMs, fostering more reliable and nuanced audio understanding in LALMs. The benchmark dataset is available at \url{https://huggingface.co/datasets/ahabench/AHa-Bench}.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/ahabench/AHa-Bench

Code URL: https://github.com/AHa-Bench/AHa-Bench

Supplementary Material: zip

Primary Area: Applications of Datasets & Benchmarks for in speech and audio

Flagged For Ethics Review: true

Submission Number: 1586

Loading