Abstract: Sophisticated Text-to-SQL methods often face errors, such as schema-linking errors, join errors, nested errors, and group-by errors. To mitigate these, it’s crucial to filter out unnecessary tables and columns, focusing the language model on relevant ones. Previous methods have attempted to sort tables and columns based on relevance or directly identify necessary elements, but these approaches suffer from long training times, high costs with GPT-4 tokens, or poor schema linking performance. We propose a two-step schema linking method: first, generate an initial SQL query using the full database schema; then, extract the relevant tables and columns to form a concise schema. This method, tested with Code Llama and GPT-4, shows optimal performance compared to mainstream methods on the Spider dataset, reducing errors and improving efficiency in SQL generation.
Loading