High Probability Contextual Bandits for Optimal Dosage Selection

Spyros Dragazis; Aldo Pacchiano

High Probability Contextual Bandits for Optimal Dosage Selection

Spyros Dragazis, Aldo Pacchiano

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Linear Bandits, Dosage Selection, Contextual Bandits

TL;DR: Dosage Selection problem modeled by Linear Contextual Bandits under high probability constraints.

Abstract: Multi-Armed Bandit ($\textit{MAB}$) formulations are commonly used to model the problem of $\textit{Optimal Dose-Finding}$. However, in many practical applications, it is necessary to receive data about the patient’s current state and then administer a drug dosage adapted to that state. To overcome this issue, we adopt a linear contextual bandit formulation with stage-wise constraints. At each round, the learner selects a dosage and receives both a reward signal and a cost signal. The learner’s goal is to maximize the drug's efficacy—captured as the expected cumulative reward—while ensuring that the toxicity, reflected by the cost signal, remains below a known threshold. Satisfying the cost signal constraint only in expectation can be dangerous, as it may lead to over-dosage complications in certain cases. To address this issue, we introduce a novel model that controls the realization of the cost signal with high probability, in contrast to previous works where control was only applied to the expected cost signal. Our algorithm follows the $\textit{UCB}$ approach, for which we establish a regret bound over $T$ rounds and run numerical experiments. We further generalize our results to $\textit{non-linear}$ functions and provide a regret bound in terms of the $\textit{eluder dimension}$, a measure of function class complexity.

Supplementary Material: zip

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11525

Loading