Keywords: active sequential hypothesis testing, pure exploration, reinforcement learning, in-context learning, best arm identification
Abstract: We study the problem _active sequential hypothesis testing_, also known as _pure exploration_: given a new task, the learner _adaptively collects data_ from the environment to efficiently determine an underlying correct hypothesis. A classical instance of this problem is the task of identifying the best arm in a multi-armed bandit problem (a.k.a. BAI, Best-Arm Identification), where actions index hypotheses. Another important case is generalized search, a problem of determining the correct label through a sequence of strategically
selected queries that indirectly reveal information about the label.
In this work, we introduce _In-Context Pure Exploration_ (ICPE), which meta-trains Transformers to map _observation histories_ to _query actions_ and a _predicted hypothesis_, yielding a model that transfers in-context. At inference time, ICPE actively gathers evidence on new tasks and infers the true hypothesis without parameter updates.
Across deterministic, stochastic, and structured benchmarks, including BAI and generalized search, ICPE is competitive with adaptive baselines while requiring no explicit modeling of information structure. Our results support Transformers as practical architectures for _general sequential testing_.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 20774
Loading