LAIA-SQL: Enhancing Natural Language to SQL Generation in Multi-Table QA via Task Decomposition and Keyword Extraction
Keywords: Natural Language Understanding, Text to SQL, Multi Table QA
Abstract: Natural Language to SQL (NL2SQL) provides an effective solution for multi-table question answering (Table QA) to automate data retrieval by transforming simple user queries into SQL commands. It enhances data accessibility and decision-making processes across various industries. Large Language Model (LLM) based NL2SQL methods have been shown to outperform rule-based or neural network-based NL2SQL methods. However, existing LLM-based NL2SQL approaches face challenges like inaccurate interpretation of user questions, slow retrieval speeds, erroneous SQL generation, and high operational costs. As there is a lack of datasets specifically designed to evaluate natural language understanding (NLU) in NL2SQL tasks and no models optimized for user question understanding in Table QA, we introduce LAIA-NLU, a novel dataset that dissects NLU into task decomposition and keyword extraction. LAIA-NLU contains 1,500 high-quality QA pairs, created through manual review. Using this dataset, we developed LAIA-NLUer, which is capable of effectively interpreting user intent in table-based queries. To further enhance NL2SQL performance in terms of speed, cost, and accuracy, we also present LAIA-SQL, a retrieval-augmented based NL2SQL framework. Experimental results show that LAIA-SQL outperforms state-of-the-art models, achieving an accuracy improvement to 67.28% in BIRD dataset, a 52.4% reduction in runtime, and a 97% decrease in operational costs. These improvements demonstrate the potential of our approach to advance multi-table data retrieval and analysis. Our code, dataset, and model will be publicly available to encourage further research in this field.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10678
Loading