Abstract: We study fixed-confidence Best Arm Identification (BAI) in semiparametric bandits, where rewards are linear in arm features plus an unknown additive baseline shift.
Unlike linear-bandit BAI, this setting requires orthogonalized regression, and its instance-optimal sample complexity has remained open.
For the transductive setting, we establish an attainable instance-dependent lower bound characterized by the corresponding linear-bandit complexity on shifted features.
We then propose a computationally efficient phase-elimination algorithm based on a new $\mathcal{X}\mathcal{Y}$-design for orthogonalized regression.
Our analysis yields a nearly optimal high-probability sample-complexity upper bound, up to log factors and an additive $d^2$ term, and experiments on synthetic instances and the Jester dataset show clear gains over prior baselines.
Code Dataset Promise: Yes
Code Dataset Url: https://github.com/seokjinkim0428/SP-BAI
Signed Copyright Form: pdf
Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.
Submission Number: 1927
Loading