ROI Maximization in Stochastic Online Decision-MakingDownload PDF

May 21, 2021 (edited Jan 21, 2022)NeurIPS 2021 PosterReaders: Everyone
  • Keywords: online learning, regret minimization, action elimination, explore-then-commit
  • TL;DR: We introduce and analyze a new theoretical framework for return-on-investment maximization in repeated decision-making
  • Abstract: We introduce a novel theoretical framework for Return On Investment (ROI) maximization in repeated decision-making. Our setting is motivated by the use case of companies that regularly receive proposals for technological innovations and want to quickly decide whether they are worth implementing. We design an algorithm for learning ROI-maximizing decision-making policies over a sequence of innovation proposals. Our algorithm provably converges to an optimal policy in class $\Pi$ at a rate of order $\min\big\{1/(N\Delta^2),N^{-1/3}\}$, where $N$ is the number of innovations and $\Delta$ is the suboptimality gap in $\Pi$. A significant hurdle of our formulation, which sets it aside from other online learning problems such as bandits, is that running a policy does not provide an unbiased estimate of its performance.
  • Supplementary Material: pdf
  • Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
11 Replies