Keywords: Linear Bandits, Dosage Selection, Contextual Bandits
TL;DR: Dosage Selection problem modeled by Linear Contextual Bandits under high probability constraints.
Abstract: Multi-Armed Bandit ($\textit{MAB}$) formulations are commonly used to model the problem of $\textit{Optimal Dose-Finding}$.
However, in many practical applications, it is necessary to receive data about the patient’s current state and then administer a drug dosage adapted to that state.
To overcome this issue, we adopt a linear contextual bandit formulation with stage-wise constraints.
At each round, the learner selects a dosage and receives both a reward signal and a cost signal.
The learner’s goal is to maximize the drug's efficacy—captured as the expected cumulative reward—while ensuring that the toxicity, reflected by the cost signal, remains below a known threshold.
Satisfying the cost signal constraint only in expectation can be dangerous, as it may lead to over-dosage complications in certain cases.
To address this issue, we introduce a novel model that controls the realization of the cost signal with high probability, in contrast to previous works where control was only applied to the expected cost signal.
Our algorithm follows the $\textit{UCB}$ approach, for which we establish a regret bound over
$T$ rounds and run numerical experiments.
We further generalize our results to $\textit{non-linear}$ functions and provide a regret bound in terms of the $\textit{eluder dimension}$, a measure of function class complexity.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11525
Loading