SynSQL: Synthetic Database Generation for Robust Evaluation of Text-to-SQL Systems

Mohammadamin Habibollah; Davood Rafiei

SynSQL: Synthetic Database Generation for Robust Evaluation of Text-to-SQL Systems

Mohammadamin Habibollah, Davood Rafiei

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Text-to-SQL, Test Data Generation, Synthetic Data Generation, Natural language Interfaces to Databases, SQL Testing

TL;DR: SynSQL is an LLM-driven framework for generating test databases to evaluate text-to-SQL systems using natural language questions and schemas. It complements existing human-curated datasets and outperforms prior test database generation methods.

Abstract: A central challenge in test-time scaling for text-to-SQL is generating test databases that can reliably validate arbitrary queries, yet existing tools remain narrow in scope and limited in capability. We introduce SynSQL, a framework for synthesizing test databases conditioned on natural language questions and schema structure. Unlike prior approaches that generate data from gold queries, SynSQL leverages large language models to generate tables directly from question–schema alignment, while remaining compatible with gold queries when available for evaluation. The framework consists of a schema selector, a synthesizer, and a critic with iterative refinement, which jointly align semantic cues from the question with structural constraints from the schema to guide database synthesis. Experiments on the Spider and BIRD benchmarks demonstrate that SynSQL produces realistic, constraint-respecting databases that effectively stress-test text-to-SQL models. SynSQL not only complements the coverage of human-curated benchmarks but also outperforms prior test database generation methods across diverse schema complexities. On Spider, SynSQL achieves a 93.04% success rate, surpassing the original human-authored dataset (92.55%), and on BIRD it attains a 79.23% agreement rate, substantially higher than prior automated methods, all while operating without access to gold queries during data generation.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 23179

Loading