Divide and Conquer: Harnessing Small Agents for Schema Extraction in NL-to-SQL Generation

Divide and Conquer: Harnessing Small Agents for Schema Extraction in NL-to-SQL Generation

ACL ARR 2024 December Submission728 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models have shown remarkable promise in various code generation tasks, particularly in SQL generation. However, much like other structured query generation tasks, SQL generation presents the unique challenge of extracting the correct schema to achieve good performance. Despite being a critical component of the process, the problem of schema extraction has received little attention, especially when it comes to Small Language Models (<10B parameters). In this paper, we propose $\texttt{LiteMARS}$: Lite Multi-Agent Recall Oriented System, the first multi-agent framework to incorporate schema linking that leverages question decomposition for the task of Natural Language to SQL (NL-to-SQL). $\texttt{LiteMARS}$ operates as a multi-agent pipeline with three key stages: $\textit{natural language query decomposition}$, $\textit{schema linking}$, and $\textit{SQL generation}$. Notably, $\texttt{LiteMARS}$ introduces a novel critic-based one-step refinement process, enhancing schema extraction and SQL generation. In experiments, we found that critic-based refinement significantly improved column recall by 26.6\% and execution accuracy by 73.4\% for NL-to-SQL generation. Further analysis shows that our $\texttt{LiteMARS}$ agent achieves comparable performance to Large Language Models like DeepSeek-Coder-33B.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: code generation and understanding, prompting

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data analysis

Languages Studied: English

Submission Number: 728

Loading