Semi-Supervised In-Context Learning: A Baseline Study

Semi-Supervised In-Context Learning: A Baseline Study

ACL ARR 2025 May Submission4983 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The high cost of obtaining high-quality annotated data for in-context learning (ICL) has motivated the development of methods that use self-generated annotations in place of ground truths. While these approaches have shown promising results in few-shot settings, they are generally not scalable to many-shot scenarios. In this work, we investigate ICL with self-generated examples using a framework analogous to traditional semi-supervised learning, consisting of annotation generation, demonstration selection, and in-context inference. Following this framework, we propose a simple baseline that outperforms ground-truth ICL in zero-shot, few-shot, and many-shot settings. Remarkably, we observe a \textit{scaling law} on this simple baseline, where optimal performance is achieved with over 1,000 demonstrations. To fully exploit this many-shot capability of semi-supervised ICL, we introduce IterPSD, an iterative annotation approach that incorporates iterative refinement and curriculum pseudo-labeling concepts from semi-supervised learning, achieving up to 6.8% additional gains in classification tasks.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: Test-Time Computing, Large Language Models, Unlabeled Data, NLP in resource-constrained settings, data-efficient training, prompting, scaling, few-shot generation

Contribution Types: NLP engineering experiment, Approaches to low-resource settings

Languages Studied: English, Bemba, Tuvan, Venetian, Sardinian, Faroese

Submission Number: 4983

Loading