Syn-QL: Prefernce Optimization with Synthetic Data for Text-to-SQL

Ruilin Hu; Lu Fan DB; 陈奕哲

Syn-QL: Prefernce Optimization with Synthetic Data for Text-to-SQL

Ruilin Hu, Lu Fan DB, 陈奕哲

05 Nov 2024 (modified: 05 Nov 2024)THU 2024 Fall AML SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Text-to-SQL; Large Language Models

TL;DR: We present Syn-QL, a framework that significantly enhances open-source LLMs for Text-to-SQL tasks through synthetic data generation and self-training techniques, making them more competitive.

Abstract: This paper addresses the challenge of improving the performance of open-source Large Language Models (LLMs) in Text-to-SQL tasks, where a natural language query is converted into an SQL statement for database interaction. Despite their accessibility and cost-efficiency, open-source LLMs lag behind closed-source models in accuracy. To bridge this gap, we introduce Syn-QL, a framework leveraging synthetic data generation and self-training techniques to fine-tune models iteratively. Our method utilizes a dual-model approach, pairing a SQL Writer and SQL Verifier to enhance the quality of SQL outputs through repeated refinement. Experimental results demonstrate notable performance improvements on established benchmarks, including Spider and BIRD, underscoring Syn-QL’s potential to make open-source LLMs more competitive in Text-to-SQL tasks.

Submission Number: 59

Loading