Construction of Query Classification System and Query Classification Corpus for Chinese Intent Recognition

Shan Yu, Hua Liu, Pengyuan Liu

Published: 2024, Last Modified: 07 Oct 2025CLSW (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper starts from the need for intent recognition in human-machine dialogue. To train a more general and transferable Chinese intent recognition model, we propose a compromise query classification system suitable for task-oriented dialogue scenarios based on a multi-concept semantic representation framework. This system consists of 4 major categories and 27 subcategories, taking into account syntax, semantics, and pragmatics, and has high coverage. Based on this system, a multi-domain query type annotation corpus was constructed through manual annotation and automatic machine verification, comprising 88,420 queries from three different domains. Finally, in-domain and cross-domain query classification experiments were performed on Bert and Text-CNN models, which had been specifically trained using a query-type annotated corpus. The results indicate that the trained Bert and Text-CNN classification models achieved high accuracy in both in-domain and cross-domain datasets, demonstrating good transferability and generality. However, due to uneven data distribution, some cross-domain query classification tasks show lower accuracy.

External IDs:dblp:conf/clsw/YuLL24