Abstract: Semantic parsing helps conversational systems in satisfying users' requests through dialogues. To train these models, collecting annotated dialogues as a dataset is a very expensive and time-consuming process. In this paper, our goal is to utilize large language models and active learning to replace Wizard-of-Oz (WoZ) collection via crowdsourcing for bootstrapping training data for task-driven semantic parsers. We first demonstrate the utility of utterances generated by GPT-3 when seeded with prior training dialogues, as evaluated by human judges. We then explore the use of parser uncertainty on generated outputs as a selection criteria for annotation and contrast this with a strategy based on Core-sets. Our pipeline leads to more useful examples on average, motivating future work on active generation for bootstrapping semantic parsers.
0 Replies
Loading