Practical Dataless Text Classification Through Dense RetrievalDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Dataless text classification aims to classify documents using only class descriptions without any training data. Recent research shows that pre-trained textual entailment models can achieve state-of-the-art dataless classification performance on various tasks. However, such models are not practical in that their prediction speed is slow as they need k forward passes to predict k classes and they are not built for fine-tuning to further improve the initial (often mediocre) performance.This work proposes a simple, effective, and practical dataless classification approach. We use class descriptions as queries to retrieve task-specific or external unlabeled data on which pseudo-labels are assigned to train a classifier. Experiments on a wide range of classification tasks show that the proposed approach consistently outperforms entailment-based models in terms of classification accuracy, prediction speed, and performance gain when fine-tuned on labeled data.
Paper Type: long
0 Replies

Loading