Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL

Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL

ACL ARR 2024 June Submission2492 Authors

15 Jun 2024 (modified: 10 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In-context learning with large language models (LLMs) is the current mainstream method for text-to-SQL. Previous studies have explored selecting relevant demonstrations from a human-labeled demonstration pool, but these methods lack diversity and incur high labeling costs. In this work, we address measuring and enhancing the diversity of the text-to-SQL demonstration pool. First, we introduce a diversity metric and present that the diversity of the existing labeling data can be further enhanced. Motivated by these findings, we propose **Fused** that iteratively fuses demonstrations to create a diverse demonstration pool based on human labeling or even from scratch with LLMs, reducing labeling costs. **Fused** achieves an average improvement of 3.2% based on existing labeling and 5.0% from scratch on several mainstream datasets, demonstrating its effectiveness.

Paper Type: Long

Research Area: Semantics: Lexical and Sentence-Level

Research Area Keywords: Syntax: Parsing, NLP Applications

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 2492

Loading