Robust Semantic Reasoning in Audio Language Models via In-Context Learning
Keywords: Audio language models, speech recognition, ac- cented speech, semantic reasoning, in-context learning
TL;DR: Lightweight method to improving semantic reasoning in Audio language models
Abstract: Audio–language models (ALMs) have recently shown strong zero-shot performance on speech understanding tasks, yet their robustness to accented speech and their ability to perform semantic reasoning remain underexplored. In this work, we investigate whether reasoning failures in ALMs stem primarily from acoustic mismatch or from linguistic decision bias. We evaluate multiple generative ALMs on audio entailment across accented and domain-shifted datasets, observing pronounced class imbalance and entailment dominance despite competitive overall accuracy. We then introduce an in-context learning (ICL) framework that conditions next-token prediction models with balanced semantic exemplars to recalibrate reasoning boundaries without parameter updates. Results show that ICL improves class balance and macro-F1 on accented data compared to domain-matched speech, suggesting that many observed failures arise from linguistic inference bias rather than purely acoustic degradation. Our findings provide new evidence that contextual semantic calibration is an effective, lightweight strategy for improving reasoning reliability in audio–language models under accent variability.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 83
Loading