Accelerating Quadratic Optimization with Reinforcement Learning

Jeffrey Ichnowski; Paras Jain; Bartolomeo Stellato; Goran Banjac; Michael Luo; Francesco Borrelli; Joseph E. Gonzalez; Ion Stoica; Ken Goldberg

Accelerating Quadratic Optimization with Reinforcement Learning

Jeffrey Ichnowski, Paras Jain, Bartolomeo Stellato, Goran Banjac, Michael Luo, Francesco Borrelli, Joseph E. Gonzalez, Ion Stoica, Ken Goldberg

Published: 09 Nov 2021, Last Modified: 26 May 2025NeurIPS 2021 PosterReaders: Everyone

Keywords: quadratic optimization, convex optimization, first-order methods, reinforcement learning for optimization

TL;DR: Using RL we train a policy to adapt internal parameters of a QP solver that allows the QP solver to converge faster

Abstract: First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved. These methods face two persistent challenges: manual hyperparameter tuning and convergence time to high-accuracy solutions. To address these, we explore how Reinforcement Learning (RL) can learn a policy to tune parameters to accelerate convergence. In experiments with well-known QP benchmarks we find that our RL policy, RLQP, significantly outperforms state-of-the-art QP solvers by up to 3x. RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications, including the QPLIB, Netlib LP and Maros-M{\'e}sz{\'a}ros problems. Code, models, and videos are available at https://berkeleyautomation.github.io/rlqp/.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: https://github.com/berkeleyautomation/rlqp

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/accelerating-quadratic-optimization-with/code)

13 Replies

Loading