Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment

Anonymous

Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment

Anonymous

16 Oct 2021 (modified: 05 May 2023)ACL ARR 2021 October Blind SubmissionReaders: Everyone

Abstract: Recently, the challenge of compositional generalization in NLP has attracted more and more attention. Specifically, many prior works show that neural networks struggle with compositional generalization where training and testing distributions differ. However, most of these works are based on word-level synthetic data or a specific data split method to generate compositional biases. In this work, we propose a clause-level compositional example generation method, and we focus on text-to-SQL tasks. We start by splitting the sentences in the Spider text-to-SQL dataset into several sub-sentences, annotating each sub-sentence with its corresponding SQL clause, resulting in a new dataset, Spider-SS. Building upon Spider-SS, we further construct a new dataset named Spider-CG, by substituting and appending Spider-SS sub-sentences to test the ability of models to generalize compositionally. Experiments show that previous models suffer significant performance degradation when evaluated on Spider-CG, even though every sub-sentence has been seen during training. To deal with this problem, we modify the RATSQL+GAP model to fit the segmented data of Spider-SS, and results show that this method can improve generalization performance.

0 Replies

Loading