Keywords: audio LLM, semantics, focus
Abstract: We introduce a new inference task of audio LLMs, where the correct response crucially depends on the location
of a focal accent. Models are tested under a variety of settings, and are only able to beat a text-only baseline with helpful prompting, including few shot examples. The proposed task shows for the first time how to test the ability of LLMs to incorporate audio information in semantic interpretation. The results show that the test is very challenging for the models tested, indicating that, for spoken language, LLMs lag far behind human abilities.
Paper Type: Short
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: semantics, language models, prosody, inference
Contribution Types: Data resources, Theory
Languages Studied: English
Submission Number: 9119
Loading