Abstract: We study the best arm identification (BAI) problem
with potentially biased offline data in the fixed
confidence setting, which commonly arises in realworld
scenarios such as clinical trials. We prove
an impossibility result for adaptive algorithms
without prior knowledge of the bias bound between
online and offline distributions. To address
this, we propose the LUCB-H algorithm, which
introduces adaptive confidence bounds by incorporating
an auxiliary bias correction to balance offline
and online data within the LUCB framework.
Theoretical analysis shows that LUCB-H matches
the sample complexity of standard LUCB when
offline data is misleading and significantly outperforms
it when offline data is helpful. We also
derive an instance-dependent lower bound that
matches the upper bound of LUCB-H in certain
scenarios. Numerical experiments further demonstrate
the robustness and adaptability of LUCB-H
in effectively incorporating offline data.
Loading