Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models

Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models

ACL ARR 2026 January Submission8290 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: audio pun, large audio language models, language understanding

Abstract: Puns represent a typical linguistic phenomenon that exploits polysemy and phonetic ambiguity to generate humour, posing unique challenges for natural language understanding. Within pun research, audio plays a central role in human communication except text and images, while datasets and systematic resources for spoken puns remain scarce, leaving this crucial modality largely underexplored. In this paper, we present APUN-Bench, the first benchmark dedicated to evaluating large audio language models (LALMs) on audio pun understanding. Our benchmark contains 4,434 audio samples annotated across three stages: pun recognition, pun word location and pun meaning inference. We conduct a deep analysis of APUN-Bench by systematically evaluating 10 state-of-the-art LALMs, uncovering substantial performance gaps in recognizing, localizing, and interpreting audio puns. This analysis reveals key challenges, such as positional biases in audio pun location and error cases in meaning inference, offering actionable insights for advancing humour-aware audio intelligence.

Paper Type: Long

Research Area: Speech Processing and Spoken Language Understanding

Research Area Keywords: audio pun, large audio language models, language understanding

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 8290

Loading