Abstract: In-context learning with large language models (LLMs) is the current mainstream method for text-to-SQL.
Previous studies have explored selecting relevant demonstrations from a human-labeled demonstration pool, but these methods lack diversity and incur high labeling costs.
In this work, we address measuring and enhancing the diversity of the text-to-SQL demonstration pool.
First, we introduce a diversity metric and present that the diversity of the existing labeling data can be further enhanced.
Motivated by these findings, we propose **Fused** that iteratively fuses demonstrations to create a diverse demonstration pool based on human labeling or even from scratch with LLMs, reducing labeling costs.
**Fused** achieves an average improvement of 3.2% based on existing labeling and 5.0% from scratch on several mainstream datasets, demonstrating its effectiveness.
Paper Type: Long
Research Area: Semantics: Lexical and Sentence-Level
Research Area Keywords: Syntax: Parsing, NLP Applications
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 2492
Loading