Enhancing NLIDBs: Advancing from Text-to-SQL to Text-to-Multi-SQL

ACL ARR 2025 February Submission7860 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Text-to-SQL is a key part of Natural Language Interfaces to Databases (NLIDB), helping non-technical users query databases. However, it has a significant limitation: it assumes users know that SQL queries return data in a table format. This often leads to problems when users ask for data that cannot be handled by a single query. To address this, we propose Text-to-Multi-SQL, which uses multiple SQL queries to meet complex user needs. We created Spider$\mathbb{S}$, the first Text-to-Multi-SQL dataset, based on the Spider dataset. It includes 7,000 training examples, 1,024 validation examples, and 2,147 test examples, made with a mix of manual and GPT4o generated data. Tests show that current models struggle with multiple SQL generation, and even advanced models perform worse (0.167-0.327 drop in accuracy) on Spider$\mathbb{S}$. We also found that models are very sensitive to prompts—switching from ``generate one SQL'' to ``generate one or multiple SQL'' significantly reduces their performance.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Text-to-SQL
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 7860
Loading