Is More Data Better? Using Transformers-Based Active Learning for Efficient and Effective Detection of Abusive Language
Abstract: Annotating abusive language content can cause psychological harm; yet, most machine learning research has prioritized efficacy (i.e., F1 or accuracy scores) while little research has analyzed data efficiency (i.e., how to minimize annotation requirements).In this paper, we use a series of simulated experiments over two datasets at varying percentages of abuse to demonstrate that transformers-based active learning is a promising approach that maintains high efficacy but substantially raises efficiency, requiring a fraction of labeled data to reach equivalent performance to passive training over the full dataset.
Paper Type: short
0 Replies
Loading