Abstract: Recent studies have shown that prompting large language models (LLMs) for Text2SQL can achieve promising performance. However, the task still remains very challenging due to the difficulty of aligning complex natural language semantics with database schema. In this paper, we present a novel prompting approach for Text2SQL via Gradual SQL Refinement (GSR). It consists of three sequential prompting steps: 1) Clause Decomposition, which breaks down a complex natural language question into simpler clauses to facilitate natural language interpretation; 2) SQL-driven Schema Linking, which improves schema linking by targeted schema information retrieval based on the preliminary SQL generated in the first step; 3) SQL Execution Refinement, which further refines the SQL generated in the second step based on the results of SQL execution. GSR is a gradual prompting approach in that it begins with only one SQL and then gradually refines the SQL based on SQL analysis and execution at each of the following steps. We have validated the efficacy of GSR by an empirical study on the benchmark datasets. Our experiments show that its execution accuracy on BIRD and Spider are 69.26% and 87.7% respectively when using GPT-4o. With only a few prompts, GSR is ranked 11th on the BIRD benchmark, considerably outperforming the existing single-candidate alternatives. Its performance is even highly competitive compared with the existing approaches based on model fine-tuning or multiple-candidate generation, which requires considerably more prompts and token consumption.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: large language model, text-to-sql, prompt design
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 3590
Loading