Structure-Guided Large Language Models for Text-to-SQL Generation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent advancements in large language models (LLMs) have shown promise in bridging the gap between natural language queries and database management systems, enabling users to interact with databases without the background of SQL. However, LLMs often struggle to fully exploit and comprehend the user intention and complex structures of databases. Decomposition-based methods have been proposed to enhance the performance of LLMs on complex tasks, but decomposing SQL generation into subtasks is non-trivial due to the declarative structure of SQL syntax and the intricate connections between query concepts and database elements. In this paper, we propose a novel Structure GUided text-to-SQL framework ( SGU-SQL) that incorporates syntax-based prompting to enhance the SQL generation capabilities of LLMs. Specifically, SGU-SQL establishes structure-aware links between user queries and database schema and recursively decomposes the complex generation task using syntax-based prompting to guide LLMs in incrementally constructing target SQLs. Extensive experiments on two benchmark datasets demonstrate that SGU-SQL consistently outperforms state-of-the-art text-to-SQL baselines.
Lay Summary: We identify the limitations of LLM-based Text-to-SQL models and introduce SGU-SQL, which breaks down the complex generation task in a syntax-aware manner. This ensures that the generated queries maintain both semantic accuracy (correctly capturing user intentions) and syntactic correctness (following proper SQL structure). SGU-SQL proposes a graph-based structure construction to comprehend user queries and database structures and then link queries and databases with dual-graph encoding. SGU-SQL introduces tailored structure-decomposed generation strategies to decompose queries with syntax trees and then incrementally generate accurate SQL with LLM. When tested on standard benchmarks, SGU-SQL consistently outperformed existing baseline models, making it easier for non-experts to get accurate data from databases using natural language.
Primary Area: Deep Learning->Large Language Models
Keywords: Text-to-SQL, large language model, structure learning
Submission Number: 14400
Loading