CLID: A Chunk-Level Intent Detection Framework for Multiple Intent Spoken Language Understanding

Haojing Huang, Peijie Huang, Zhanbiao Zhu, Jia Li, Piyuan Lin

Published: 01 Jan 2022, Last Modified: 18 Jun 2024IEEE Signal Process. Lett. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multi-intent spoken language understanding (SLU) that can handle an utterance containing multiple intents is more practical and attracts increasing attention. However, existing state-of-the-art models are either too coarse-grained (Utterance-level) or too fine-grained (Token-level) in intent detection, and thus may fail to recognize the intent transition point and the correct intents in an utterance. In this paper, we propose a Chunk-Level Intent Detection (CLID) framework, where we introduce a sliding window-based self-attention (SWSA) scheme for regional chunk intent detection. Based on the SWSA, an auxiliary task is introduced to identify the intent transition point in an utterance and obtain sub-utterances with a single intent. The intent of each sub-utterance is then predicted by assembling the intent predictions of the chunks (in a sliding window manner) within it. We conduct experiments on two public datasets, MixATIS and MixSNIPS, and the results show that our model achieves state-of-the-art performance.