Linear Bandits with Partially Observable Features

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Linear Bandits, Partially Observable Features, Doubly Robust
TL;DR: This work proposes an algorithm for linear bandits with latent features, achieving sublinear regret by augmenting orthogonal basis vectors and using a doubly-robust reward estimator, without requiring prior knowledge of the unobserved feature space.
Abstract:

We introduce a novel linear bandit problem where a subset of features is latent, resulting in partial access to reward information and spurious estimates. Without properly addressing the latent features, the regret grows linearly over the decision epoch $T$ while improving the regret bound is challenging because their dimension and relationship with rewards are not available. We propose a novel analysis to handle the latent features and an algorithm that achieves a regret bound sublinear in $T$. The core of the algorithm lies in (i) augmenting basis vectors orthogonal to the observable feature space, and (ii) developing an efficient doubly robust estimator that further improves the regret bound. With these two ingredients, our algorithm achieves a regret bound of $\tilde{O}(\sqrt{(d + d_h)T})$, where $d$ is the dimension of observable features, and $d_h$ is the unknown dimension of the unobserved features that affects the reward. Crucially, our algorithm does not rely on prior knowledge of the unobserved feature space, which expands as more features become hidden. Numerical experiments confirm that our algorithm outperforms both non-contextual multi-armed bandits and other linear bandit algorithms.

Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11820
Loading