Risk-Averse Learning with Nonstationary Distribution

Published: 28 Feb 2026, Last Modified: 04 Apr 2026CAO PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Zeroth-order optimization, nonstationary distribution, CVaR
Abstract: Considering non-stationary environments in online optimization enables decision-maker to effectively adapt to changes and improve its performance over time. In such cases, it is favorable to adopt a strategy that minimizes the negative impact of change to avoid potentially risky situations. In this paper, we investigate risk-averse online optimization where the distribution of random costs changes over time. The Conditional Value at Risk (CVaR) is employed as risk measure. Due to the difficulty in obtaining the exact CVaR gradient, we employ a zeroth-order optimization approach that queries the cost values multiple times per iteration and estimates the CVaR gradient based on these samples. In regret analysis, the varying distributions are captured by a novel variation metric based on the Wasserstein distance. Given that the distribution variation is sublinear in the iteration horizon, we show that the developed learning algorithm achieves sublinear dynamic regret with high probability for both convex and strongly convex functions. Moreover, theoretical results suggest dynamic regret bounds decrease with the increasing sampling number until it reaches a specific limit. Finally, we provide numerical experiments of dynamic pricing in a parking lot to illustrate the efficacy of the designed algorithm.
Submission Number: 112
Loading