Online Inverse Linear Optimization: Efficient Logarithmic-Regret Algorithm, Robustness to Suboptimality, and Lower Bound

Shinsaku Sakaue; Taira Tsuchiya; Han Bao; Taihei Oki

Online Inverse Linear Optimization: Efficient Logarithmic-Regret Algorithm, Robustness to Suboptimality, and Lower Bound

Shinsaku Sakaue, Taira Tsuchiya, Han Bao, Taihei Oki

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: inverse optimization, online learning

TL;DR: We present an efficient $O(n \ln T)$-regret method for online inverse linear optimization, extend it to suboptimal feedback, and provide an $\Omega(n)$-regret lower bound.

Abstract: In online inverse linear optimization, a learner observes time-varying sets of feasible actions and an agent's optimal actions, selected by solving linear optimization over the feasible actions. The learner sequentially makes predictions of the agent's true linear objective function, and their quality is measured by the *regret*, the cumulative gap between optimal objective values and those achieved by following the learner's predictions. A seminal work by Bärmann et al. (2017) obtained a regret bound of $O(\sqrt{T})$, where $T$ is the time horizon. Subsequently, the regret bound has been improved to $O(n^4 \ln T)$ by Besbes et al. (2021, 2025) and to $O(n \ln T)$ by Gollapudi et al. (2021), where $n$ is the dimension of the ambient space of objective vectors. However, these logarithmic-regret methods are highly inefficient when $T$ is large, as they need to maintain regions specified by $O(T)$ constraints, which represent possible locations of the true objective vector. In this paper, we present the first logarithmic-regret method whose per-round complexity is independent of $T$; indeed, it achieves the best-known bound of $O(n \ln T)$. Our method is strikingly simple: it applies the online Newton step (ONS) to appropriate exp-concave loss functions. Moreover, for the case where the agent's actions are possibly suboptimal, we establish a regret bound of $O(n\ln T + \sqrt{\Delta_T n\ln T})$, where $\Delta_T$ is the cumulative suboptimality of the agent's actions. This bound is achieved by using MetaGrad, which runs ONS with $\Theta(\ln T)$ different learning rates in parallel. We also present a lower bound of $\Omega(n)$, showing that the $O(n\ln T)$ bound is tight up to an $O(\ln T)$ factor.

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 6625

Loading