Sequential Experimental Design for Transductive Linear BanditsDownload PDF

Lalit Jain, Kevin Jamieson, Tanner Fiez, Lillian Ratliff

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone
Abstract: In this paper we introduce the \emph{transductive linear bandit problem}: given a set of measurement vectors $\mathcal{X}\subset\mathbb{R}^d$, a set of items $\mathcal{Z}\subset \mathbb{R}^d$, a fixed confidence $\delta$, and an unknown vector $\theta^{\ast}\in \mathbb{R}^d$, the goal is to infer $\argmax_{z\in \mathcal{Z}} z^\top\theta^\ast$ with probability $1-\delta$ by making as few sequentially chosen noisy measurements of the form $x^\top\theta^{\ast}$ as possible. When $\mathcal{X}=\mathcal{Z}$, this setting generalizes \emph{linear bandits}, and \emph{combinatorial bandits} when $\mathcal{X}$ is the standard basis vectors and $\mathcalZ\subset \{0,1\}^d$. Such a transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages $\mathcal{X}$ a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages $\mathcal{Z}$ that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books a user is queried about $\mathcal{X}$ may be restricted to well known best-sellers even though the goal might be to recommend more esoteric titles $\mathcal{Z}$. In this paper, we provide instance-dependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation. In particular, we provide the first non-asymptotic algorithm for linear bandits that nearly achieves the information theoretic lower bound.
Code Link: https://github.com/fiezt/Transductive-Linear-Bandit-Code
CMT Num: 5689
0 Replies

Loading