Improving Spoken Semantic Parsing using Unpaired Text from Textual Corpora and Large Language Model Prompting

Anonymous

Improving Spoken Semantic Parsing using Unpaired Text from Textual Corpora and Large Language Model Prompting

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: We can use unpaired text data to improve spoken semantic parsing on existing and new domains, and use LLMs to synthesize unpaired text when unavailable.

Abstract: Spoken semantic parsing (SSP) involves generating machine-comprehensible parses from input speech. Training robust models for existing application domains represented in training data or extending to new domains requires corresponding triplets of speech-transcript-semantic parse data, which is expensive to obtain. In this paper, we address this challenge by examining methods that can use or generate transcript-semantic parse data (unpaired text) without corresponding speech. First, when unpaired text is drawn from existing textual corpora, we compare Joint Audio Text (JAT) and Text-to-Speech (TTS) as ways to use unpaired text to generate speech representations. Experiments on the STOP dataset show that unpaired text from existing and new domains improves performance by 2% and 30% in absolute Exact Match (EM) respectively. Second, when unpaired text is not available from existing textual corpora, Large Language Models (LLMs) can be prompted to generate unpaired text for existing and new domains, and JAT or TTS can be used with the generated unpaired text to improve SSP. Prior work has mostly focused on using LLMs to generate synthetic data for classification tasks. In this paper, we introduce multiple prompting strategies to obtain synthetic data in existing and new domains based on intent classes, intent-slot combinations and example transcripts and parses. Experiments show that using synthetic parse data with JAT for existing domains can improve SSP performance on STOP by 1.4 % absolute EM. Using synthetic parse data with TTS for a new held-out domain improves EM on STOP for the held out domain by 2.6% absolute.

Paper Type: long

Research Area: Speech recognition, text-to-speech and spoken language understanding

Contribution Types: NLP engineering experiment, Approaches to low-resource settings

Languages Studied: English

0 Replies

Loading