Speech-to-SQL Parsing: Error Correction with Multi-modal RepresentationsDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: We study the task of spoken natural language to SQL parsing (speech-to-SQL), where the goal is to map a spoken utterance to the corresponding SQL. Existing work on SQL parsing has focused on text as input (text-to-SQL). To develop a speech-to-SQL parser, we harness progress in text-to-SQL parsing, and automatic speech recognition (ASR). However, ASR is still error-prone, we therefore propose an error correction method that fixes ASR errors in the context of a DB schema. We present a novel multi-modal representation of text, audio, and DB schema with audio attention and a phoneme prediction auxiliary task. Our experiments show that our method yields better performance, is much faster to train, has greater transparency, and is parser-agnostic compared to baselines that seek to adapt to ASR errors.
0 Replies

Loading