Abstract: Text-to-SQL is a fundamental natural language processing (NLP) task that involves translating natural language queries related to a specified relational database into SQL queries. Recently, large language models (LLMs) have emerged as a crucial paradigm in the Text-to-SQL task. Despite their success, current methods heavily depend on closed-source LLMs with a large number of parameters, such as ChatGPT and GPT4, resulting in significant API costs and privacy concerns. Therefore, a more cost-effective strategy is to fine-tune open-source LLMs with smaller parameters for SQL generation. However, this alternative strategy faces challenges due to the weaker reasoning capabilities of open-source LLMs, particularly in generating complex SQL queries. To address this issue, we propose a Chain-of-Programs (COP) prompting framework for Text-to-SQL. Different from the conventional Chain-of-Thoughts (COT), we utilize Pandas code as an intermediate representation aligned with the step-wise nature of human thinking. This decomposition transforms complex SQL queries into a series of simple Pandas queries. Each step in the COP can be validated using a Python interpreter. Finally, we use the COP prompting to generate SQL queries. Experiments conducted on the Spider dataset using two open-source large language models have demonstrated that our performances are comparable to GPT4 in zero-shot scenario.
Loading