QRIM: Quantum Robust Inner Minimization for Reinforcement Learning

Hyun Kyu Lee; Joongheon Kim; Sung Whan Yoon

QRIM: Quantum Robust Inner Minimization for Reinforcement Learning

Hyun Kyu Lee, Joongheon Kim, Sung Whan Yoon

17 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement learning, Quantum computing

Abstract: Reinforcement learning (RL) often fails when faced with unexpected environmental changes that were unseen during training. Robust reinforcement learning (RRL) tackles this challenge by optimizing policies against the worst-case scenario defined within an uncertainty set. However, RRL remains impractical due to the cost of the Max-Min optimization, where it suffers from the query complexity for exhaustively finding the worst-case (dubbed ‘Min’) within the uncertainty set 𝒰, i.e., O(|𝒰|). By viewing this via a lens of quantum perspective, we raise a pivotal question: If we can query from the environment with quantum superpositions, is it possible to accelerate the Max-Min optimization of RRL? Our answer is ‘Yes’. Our method, called quantum robust inner minimization (QRIM), encodes the uncertainty set with quantum superposition and amplifies low-return cases, thus enabling RL for solving the robust (i.e., worst-case) Bellman equation. Importantly, QRIM achieves a quadratic speed-up in query complexity without altering the outer RL pipeline, i.e., O(√|𝒰|). Validated through classical simulations to real quantum hardware execution, QRIM learns more robust policies for unseen task variations than classical RL methods, while achieving a quadratic reduction in query complexity compared to classical RRL methods.

Primary Area: reinforcement learning

Submission Number: 9532

Loading