The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models

Karime Maamari; Fadhil Abubaker; Daniel Jaroslawicz; Amine Mhedhbi

The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models

Karime Maamari, Fadhil Abubaker, Daniel Jaroslawicz, Amine Mhedhbi

Published: 10 Oct 2024, Last Modified: 25 Oct 2024TRL @ NeurIPS 2024 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Schema Linking, Text-to-SQL, BIRD Benchmark, Natural Language Interfaces to Databases

TL;DR: Maximally use the LLM context window with added schema information without overfiltering irrelevant elements for better accuracy

Abstract: In Text-to-SQL pipelines, schema linking is used to retrieve tables and columns that are relevant to the user's natural language query. However, inaccuracies in schema linking can lead to the exclusion of crucial information, which in turn adversely affects SQL generation. In this work, we revisit the need for schema linking when using the latest generation of large language models (LLMs). We find that newer models can accurately identify relevant schema during SQL generation, even in the presence of substantial irrelevant data. Consequently, our Text-to-SQL pipeline forgoes schema linking when the entire database schema fits within the model's context window. This approach eliminates errors due to faulty schema linking by ensuring that no schema information is omitted. Furthermore, we introduce techniques such as augmentation, selection, and correction, which improve Text-to-SQL accuracy without the risk of filtering out essential schema information. Our approach ranks first on the BIRD benchmark, achieving an accuracy of 71.83%.

Submission Number: 43

Loading