LINGUIST: Language Model Instruction Tuning to Generate Utterances for Intent Classification and Slot TaggingDownload PDF

Anonymous

04 Mar 2022 (modified: 05 May 2023)Submitted to NLP for ConvAIReaders: Everyone
Keywords: data augmentation, NLU, multilingual, large seq2seq models
TL;DR: We present a method for generating multilingual synthetic data for Intent Classification and Slot Tagging (IC+ST) using flexible instruction prompts.
Abstract: We present LINGUIST, a method for generating synthetic data for Intent Classification and Slot Tagging (IC+ST) based on a 5B multilingual seq2seq model fine-tuned on a flexible instruction prompt. On a 10-shot setting for learning a new SNIPS intent, we show absolute improvement of +2.5 points (IC) and +2.8 points (ST) over data upsampling, and combined gains of +4.7 points (IC) and +3.2 points (ST) when combined with Back-Translation. On an internal production dataset for Conversational Agent IC+ST, we show between 7.9% and 25.2% relative improvement compared to an internal baseline across four languages. To the best of our knowledge, we are the first to use instruction fine-tuning of a large scale seq2seq model to generate slot-labeled data.
0 Replies

Loading