Weakly Supervised Data Augmentation Through Prompting for Dialogue Understanding

Maximillian Chen; Alexandros Papangelis; Chenyang Tao; Andy Rosenbaum; Seokhwan Kim; Yang Liu; Zhou Yu; Dilek Hakkani-Tur

Weakly Supervised Data Augmentation Through Prompting for Dialogue Understanding

Maximillian Chen, Alexandros Papangelis, Chenyang Tao, Andy Rosenbaum, Seokhwan Kim, Yang Liu, Zhou Yu, Dilek Hakkani-Tur

03 Oct 2022 (modified: 06 Jul 2025)Neurips 2022 SyntheticData4MLReaders: Everyone

Keywords: prompting, dialogue understanding, data augmentation

TL;DR: Applying weak supervision to create high-quality augmented conversation datasets generated through prompting.

Abstract: Dialogue understanding tasks often necessitate abundant annotated data to achieve good performance and that presents challenges in low-resource settings. To alleviate this barrier, we explore few-shot data augmentation for dialogue understanding by prompting large pre-trained language models and present a novel approach that iterates on augmentation quality by applying weakly-supervised filters. We evaluate our methods on the emotion and act classification tasks in DailyDialog and the intent classification task in Facebook Multilingual Task-Oriented Dialogue. Models fine-tuned on our augmented data mixed with few-shot ground truth data are able to approach or surpass existing full-shot state-of-the-art performance on both datasets. For DailyDialog specifically, using 10% of the ground truth data we outperform the current state-of-the-art model which uses 100% of the data.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/weakly-supervised-data-augmentation-through/code)

4 Replies

Loading