Automated Human-Readable Label Generation in Open Intent Discovery

Grant Anderson, Emma Hart, Dimitra Gkatzia, Ian Beaver

Published: 01 Jan 2024, Last Modified: 04 Aug 2025INTERSPEECH 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The correct determination of user intent is key in dialog systems. However, an intent classifier often requires a large, labelled training dataset to identify a set of known intents. The creation of such a dataset is a complex and time-consuming task which usually involves humans applying clustering tools to unlabelled data, analysing the results, and creating human-readable labels for each cluster. While many Open Intent Discovery works tackle the problem of discovering clusters of common intent, few generate a human-readable label that can be used to make decisions in downstream systems. To address this, we introduce a novel candidate label extraction method then evaluate six combinations of candidate extraction and label selection methods on three datasets. We find that our extraction method produces more detailed labels than the alternatives and that high quality intent labels can be generated from unlabelled data without resorting to applying costly pre-trained language models.