Cluster & Tune: Enhance BERT Performance in Low Resource Text ClassificationDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: low resource, BERT, clustering
Abstract: In data-constrained cases, the common practice of fine-tuning BERT for a target text classification task is prone to producing poor performance. In such low resources scenarios, we suggest performing an unsupervised classification task prior to fine-tuning on the target task. Specifically, as such an intermediate task, we perform unsupervised clustering, training BERT on predicting the cluster labels. We test this hypothesis on various data sets, and show that this additional classification step can reduce the demand for labeled examples. We further discuss under which conditions this task is helpful and why.
One-sentence Summary: we suggest adding an unsupervised intermediate classification step, before finetunning and after pretraining BERT, and show it improves performance for data-constrained cases.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=rrEf1KouWb
9 Replies

Loading