Abstract: Existing NL2SQL systems rely heavily on LLMs, prompting them with database schema descriptions. However, real-world databases often contain complex schemas, ambiguous naming, and schema-incompliant instances, making accurate SQL generation challenging. Additionally, recent trends of using long prompts and generating multiple candidate queries contribute to high computational costs.
To mitigate these issues, we propose two Database-Centric techniques: View-based Optimization, which simplifies schema representation using database views, and Database-as-a-Tool, which leverages database functionalities to refine SQL queries. Our approach achieves an execution accuracy of 70.47% on the BIRD benchmark, comparable to existing NL2SQL methods, while greatly reducing input tokens by 17× to 374×.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: Semantic Parsing, Table QA, Knowledge Base QA, Reasoning
Languages Studied: English
Submission Number: 5275
Loading