RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules

Published: 23 Jan 2024, Last Modified: 23 May 2024TheWebConf24 OralEveryoneRevisionsBibTeX
Keywords: weak supervision, text classification, seed word, pre-trained language model, prompt, logical rule, rule mining, pseudo label
Abstract: Weakly supervised text classification (WSTC), also called zero-shot or dataless text classification, has attracted increasing attention due to its applicability in classifying a mass of texts within the dynamic and open Internet environment, since it requires only a limited set of seed words (label names) for each category instead of labeled data. With the help of recently popular prompting pre-trained language models (PLMs), many studies leveraged manually crafted and/or automatically identified verbalizers to estimate the likelihood of categories, but they failed to differentiate the effects of these category-indicative words, let alone capture their correlations and realize adaptive adjustments according to the unlabeled corpus. In this paper, in order to let the PLM better understand each category, we at first propose a novel form of rule-based knowledge using logical expressions to characterize the meanings of categories. Then, we develop a prompting PLM-based approach named RulePrompt for the WSTC task, consisting of a rule mining module and a rule-enhanced pseudo label generation module, plus a self-supervised fine-tuning module to make the PLM align with this task. Within this framework, the inaccurate pseudo labels assigned to texts and the imprecise logical rules associated with categories mutually enhance each other in an alternative manner, establishing a self-iterative closed loop of knowledge (rule) acquisition and utilization, with seed words serving as the starting point. Extensive experiments validate the effectiveness and robustness of our approach, which outperforms state-of-the-art weakly supervised methods. Importantly, our approach yields interpretable category rules, proving its advantageous for disambiguating easily-confused categories.
Track: Web Mining and Content Analysis
Submission Guidelines Scope: Yes
Submission Guidelines Blind: Yes
Submission Guidelines Format: Yes
Submission Guidelines Limit: Yes
Submission Guidelines Authorship: Yes
Student Author: Yes
Submission Number: 1740
Loading