Open-Set Text Classification with Limited Labeling Budget

Amit Tulsidas Chaulwar

Open-Set Text Classification with Limited Labeling Budget

Amit Tulsidas Chaulwar

23 Jan 2025 (modified: 18 Jun 2025)Submitted to ICML 2025EveryoneRevisionsBibTeXCC BY-NC 4.0

Abstract: Even with tremendous improvements in the performance of NLP models, the practical implementation of such models to different domains, languages or styles is expensive due to the cost associated with gathering and labelling task-specific data. Also, the practical systems need to consider open-set recognition scenarios where a sample from an unknown category may be encountered. We propose methodologies, sample sparsification and amplification, that solve these two problems of learning with small labelled data and open set recognition, respectively. We show the effectiveness of the proposed methods in text classification tasks with multiple open-source text classification datasets.

Primary Area: Applications->Language, Speech and Dialog

Keywords: Open-set classification, few-shot learning

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Submission Number: 8901

Loading