Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning

Anonymous

Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: Inspired by Variation Theory we use neuro-symbolic patterns to guide the generation of counterfactual examples using an LLM. We then evaluate the generated counterfactuals in their ability to address the cold start problem in active learning.

Abstract: Active Learning (AL) allows models to learn interactively from user feedback. This paper introduces a counterfactual data augmentation approach to AL, particularly addressing the selection of datapoints for user querying, a pivotal concern in enhancing data efficiency. Our approach is inspired by Variation Theory, a theory of human concept learning that emphasizes the essential features of a concept by focusing on what stays the same and what changes. Instead of just querying with existing datapoints, our approach synthesizes artificial datapoints that highlight key similarities and differences among labels using a neuro-symbolic pipeline combining large language models (LLMs) and rule-based models. Through an experiment in the example domain of text classification, we show that our approach achieves a comparable accuracy to prevalent AL strategies while necessitating fewer annotations. This research sheds light on integrating theories of human learning into the optimization of AL.

Paper Type: long

Research Area: Generation

Contribution Types: NLP engineering experiment, Approaches to low-resource settings

Languages Studied: English

0 Replies

Loading