VQ-TEGAN: Data Augmentation with Text Embeddings for Few-shot Learning

ACL ARR 2024 June Submission3841 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Data augmentation is crucial for the fine-tuning of pre-trained models and the optimization of limited data utilization, particularly within the realm of few-shot learning. Traditionally, these techniques have been applied at the word and sentence levels, with little research conducted within the embedding space. This paper introduces VQ-TEGAN, a novel data augmentation approach designed to generate embeddings specifically for a few-shot learning. VQ-TEGAN generates embeddings that augment the few-shot dataset by training directly within the PLMs' word embedding, employing a customized loss function. Empirical valildation on GLUE benchmark datasets demonstrates that VQ-TEGAN markedly improves text classification performance. Additionally, we investigate the application of VQ-TEGAN with RoBERTa-large and BERT-large, offering insight for further application.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: data augmentation, word embeddings, few-shot learning
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 3841
Loading