Estimate to Decide: Matrix Completion driven Smoothed Online Quadratic Optimization

Published: 28 Nov 2025, Last Modified: 30 Nov 2025NeurIPS 2025 Workshop MLxOREveryoneRevisionsBibTeXCC BY 4.0
Keywords: Online Algorithms, Matrix Estimation, Sequential Decision Making
Abstract: This work tackles the problem of **blind online optimization with movement costs**, where a player must make sequential decisions to balance an unknown dynamic hitting cost $f_t(x)$ against a metric penalty $c(x_t,x_{t-1})$ for changing actions between consecutive rounds, while requiring to estimate $f_t$'s structure. We study this problem for general quadratic costs under a restrictive, noisy bandit feedback model. In this setting, the player only observes the location of the hitting cost before taking an action and receives a single, noisy value of the cost it suffers post-action. To address this challenge, we provide the first algorithm for this setting that provably achieves a **sub-linear dynamic regret**, by combining online matrix estimation and the dynamic balancing of hitting and switching costs, within a principled exploration-exploitation framework.
Submission Number: 11
Loading