Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Keywords: uncertainty estimation, in-context learning, active learning, large language models, black-box APIs, few-shot learning, example selection, Cover-ICL
TL;DR: We show that black-box and grey-box language models require fundamentally different uncertainty-based example selection strategies: Cover-ICL works best with logit access while simple hardest selection excels for API-only models.
Abstract: In-context learning (ICL) with Large Language Models has been historically effective, but performance depends heavily on demonstration quality while annotation budgets remain constrained. Existing uncertainty-based selection methods like Cover-ICL achieve strong performance through logit-based uncertainty estimation, but most production LLMs operate as black-box APIs where internal states are inaccessible. This paper investigates whether effective uncertainty guided example selection can be maintained under black-box constraints by developing a consistency-based uncertainty estimation using only output observations. We evaluate five active learning methods (random, hardest, VoteK, fast-VoteK, and Cover-ICL) across seven benchmark datasets under both grey-box and black-box settings. Experiments reveal paradigm-dependent strategies: grey-box achieves best performance with Cover-ICL (60.83% average accuracy), while black-box favors hardest selection (69.26% average accuracy), but no single method dominates across all datasets. Our framework enables selecting appropriate uncertainty estimation strategies based on model accessibility constraints in practical deployment scenarios.
Submission Track: Workshop Paper Track
Submission Number: 15
Loading