In-Context Learning for Pure Exploration

In-Context Learning for Pure Exploration

ICLR 2026 Conference Submission20774 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: active sequential hypothesis testing, pure exploration, reinforcement learning, in-context learning, best arm identification

Abstract: We study the problem _active sequential hypothesis testing_, also known as _pure exploration_: given a new task, the learner _adaptively collects data_ from the environment to efficiently determine an underlying correct hypothesis. A classical instance of this problem is the task of identifying the best arm in a multi-armed bandit problem (a.k.a. BAI, Best-Arm Identification), where actions index hypotheses. Another important case is generalized search, a problem of determining the correct label through a sequence of strategically selected queries that indirectly reveal information about the label. In this work, we introduce _In-Context Pure Exploration_ (ICPE), which meta-trains Transformers to map _observation histories_ to _query actions_ and a _predicted hypothesis_, yielding a model that transfers in-context. At inference time, ICPE actively gathers evidence on new tasks and infers the true hypothesis without parameter updates. Across deterministic, stochastic, and structured benchmarks, including BAI and generalized search, ICPE is competitive with adaptive baselines while requiring no explicit modeling of information structure. Our results support Transformers as practical architectures for _general sequential testing_.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 20774

Loading