Unified Spoken Language Understanding from Heterogeneous Data: Spoken NER and SA as a Case Study

ACL ARR 2026 January Submission2302 Authors

02 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: automatic speech recognition; spoken language understanding;name entity recognition; sentiment anlysis
Abstract: Spoken Language Understanding (SLU) plays a fundamental role in enabling intelligent systems to comprehend human speech. It involves multiple tasks, such as Automatic Speech Recognition (ASR), spoken Named Entity Recognition (NER), and Sentiment Analysis (SA). Existing approaches typically address these tasks in isolation using task-specific models. Such a paradigm limits the effective utilization of heterogeneous datasets across tasks, restricts cross-task interaction, and increases overall system complexity. In this work, we propose a unified framework that jointly models different SLU tasks within a single architecture, taking spoken NER and spoken SA as a case study. Specifically, we design a unified, task-agnostic representation that facilitates the effective use of heterogeneous datasets across multiple tasks. Building upon this representation, we further propose a unified generative approach that jointly models ASR, spoken NER, and SA. Extensive experiments on public SLU datasets demonstrate that our method achieves superior SLUE scores compared to prior methods. Notably, compared with several popular LLM-based methods, our method improves the SLUE score by more than 2 points while achieving up to a 5× improvement in efficiency.
Paper Type: Long
Research Area: Speech Processing and Spoken Language Understanding
Research Area Keywords: spoken language understanding
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English, Chinese
Submission Number: 2302
Loading