$\mathcal{V}$-Synthesis: Task-Agnostic Synthesis of Consistent and Diverse In-Context Demonstrations from Scratch via $\mathcal{V}$-Entropy

$\mathcal{V}$-Synthesis: Task-Agnostic Synthesis of Consistent and Diverse In-Context Demonstrations from Scratch via $\mathcal{V}$-Entropy

ACL ARR 2025 May Submission2661 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: High labeling cost for in-context learning (ICL) demonstrations motivates using large language models (LLMs) for synthesis to reduce overhead. However, existing synthesis methods are mainly task-specific or rely on pre-existing demonstrations. So this paper focuses on synthesizing demonstrations from scratch for arbitrary tasks. A major challenge in synthesizing from scratch is ensuring consistency with the target task, as the lack of labeling guidance could lead to synthesis bias. We first propose a consistency metric called $\mathcal{V}$-Score, which has higher performance and lower computation cost compared with the metrics based on grams or embedding vectors. Furthermore, we introduce $\mathcal{V}$-Synthesis, which leverages $\mathcal{V}$-Score for proportional sampling to ensure both high consistency and diversity of synthesized demonstrations. Experimental results demonstrate that $\mathcal{V}$-Synthesis yields an average performance improvement of $2.0\%$ compared to existing synthesis methods confirming the effectiveness of $\mathcal{V}$-Synthesis.

Paper Type: Long

Research Area: Generation

Research Area Keywords: inference methods, few-shot generation

Contribution Types: NLP engineering experiment, Approaches to low-resource settings

Languages Studied: English

Submission Number: 2661

Loading