ARISE: Automatic Rule Induction and Filtering for Few-shot Text Classification

ACL ARR 2024 June Submission4092 Authors

16 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We propose ARISE, a framework that combines weak supervision, synthetic data generation and contrastive representation learning for few-shot text classification (FSTC). Weak supervision forms a major novelty in ARISE. Here, we propose an automatic rule induction component to induce rules from syntactic-ngrams using inductive generalisation. The rules we induce capture syntactic information, often not explicitly captured by state of the art neural models. While these rules can be noisy, they are used to learn a label aggregation model with data programming. Subsequently, we jointly train the base classifier along with the label aggregation model to update their parameters. Unlike, past work that employ data programming to label unlabeled data-points, we use it for verifying synthetically generated labeled data. Finally, we combine synthetic data generation and automatic rule induction, via bootstrapping, to iteratively filter the generated rules and data. Our experiments with nine FSTC datasets over diverse domains, and multilingual experiments on seven languages, show consistent and statistically significant improvements for our proposed approach over other state-of-the-art approaches.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: few-shot text classification, few-shot learning, weak supervision, automatic rule-induction, automatic rule filtering, Inductive Generalisation, data augmentation; NLP in resource-constrained setting, Data programming
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English, Hindi, Japanese, French, Spanish, German, Chinese
Submission Number: 4092
Loading