Keywords: Audio Hallucination, Large Audio Language Model, Benchmark
Abstract: Hallucinations present a significant challenge in the development and evaluation of large language models (LLMs), directly affecting their reliability and accuracy. While notable advancements have been made in research on textual and visual hallucinations, there is still a lack of a comprehensive benchmark for evaluating auditory hallucinations in large audio language models (LALMs). To fill this gap, we introduce **AHa-Bench**, a systematic and comprehensive benchmark for audio hallucinations. Audio data, in particular, uniquely combines the multi-attribute complexity of visual data with the semantic richness of textual data, leading to auditory hallucinations that share characteristics with both visual and textual hallucinations. Based on the source of these hallucinations, AHa-Bench categorizes them into semantic hallucinations, acoustic hallucinations, and semantic-acoustic confusion hallucinations. In addition, we systematically evaluate seven open-source local perception language models (LALMs), demonstrating the challenges these models face in audio understanding, especially when it comes to jointly understanding semantic and acoustic information. Through the development of a comprehensive evaluation framework, AHa-Bench aims to enhance the robustness and stability of LALMs, fostering more reliable and nuanced audio understanding in LALMs. The benchmark dataset is available at \url{https://huggingface.co/datasets/ahabench/AHa-Bench}.
Croissant File:  json
Dataset URL: https://huggingface.co/datasets/ahabench/AHa-Bench
Code URL: https://github.com/AHa-Bench/AHa-Bench
Supplementary Material:  zip
Primary Area: Applications of Datasets & Benchmarks for in speech and audio
Flagged For Ethics Review: true
Submission Number: 1586
Loading