Learning in Prophet Inequalities with Noisy Observations

ICLR 2026 Conference Submission12930 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Prophet Inequalities, Learning, Stopping Time, Decision-Making
Abstract: We study the prophet inequality, a fundamental problem in online decision-making and optimal stopping, in a practical setting where rewards are observed only through noisy realizations and reward distributions are unknown. At each stage, the decision-maker receives a noisy reward whose true value follows a linear model with an unknown latent parameter, and observes a feature vector drawn from a distribution. To address this challenge, we propose algorithms that integrate learning and decision-making via lower-confidence-bound (LCB) thresholding. In the i.i.d. setting, we establish that both an Explore-then-Decide strategy and an $\varepsilon$-Greedy variant achieve the sharp competitive ratio of $1 - 1/e$. For non-identical distributions, we show that a competitive ratio of $1/2$ can be guaranteed against a relaxed benchmark. Moreover, with window access to past rewards, the optimal ratio of $1/2$ against the optimal benchmark is achieved. Experiments on synthetic datasets confirm our theoretical results and demonstrate the efficiency of our algorithms.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 12930
Loading