Database-Centric NL2SQL

ACL ARR 2025 February Submission5275 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Existing NL2SQL systems rely heavily on LLMs, prompting them with database schema descriptions. However, real-world databases often contain complex schemas, ambiguous naming, and schema-incompliant instances, making accurate SQL generation challenging. Additionally, recent trends of using long prompts and generating multiple candidate queries contribute to high computational costs. To mitigate these issues, we propose two Database-Centric techniques: View-based Optimization, which simplifies schema representation using database views, and Database-as-a-Tool, which leverages database functionalities to refine SQL queries. Our approach achieves an execution accuracy of 70.47% on the BIRD benchmark, comparable to existing NL2SQL methods, while greatly reducing input tokens by 17× to 374×.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: Semantic Parsing, Table QA, Knowledge Base QA, Reasoning
Languages Studied: English
Submission Number: 5275
Loading