Building Production-Quality NLG Models with Minimal Labelled Data

Anonymous

29 Jun 2020 (modified: 29 Jun 2020)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone

Keywords: NLG, Few shot, low resource, data efficient, BART

TL;DR: Data efficient approaches for bootstrapping NLG models

Abstract: Natural language generation (NLG) plays an important role in task-oriented dialog systems to provide meaningful and natural responses to user's requests. However, training a NLG model that could surface production-ready quality responses usually requires a large amount of training data. In this paper, we propose two novel data-efficient approaches to bootstrap the model. We first propose a template-based approach that leverages a scenario generation framework to create full coverage of possible scenarios and their corresponding synthetic annotations. Secondly, we leverage the pretrained BART model with a bucketing method that groups scenarios based on their dialog act structures. Extensive experiments on three datasets show our approaches achieve production-quality with 10 times less labelled data than a standard NLG dataset.

0 Replies