QRIM: Quantum Robust Inner Minimization for Reinforcement Learning

17 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement learning, Quantum computing
Abstract: Reinforcement learning (RL) often fails when faced with unexpected environmental changes that were unseen during training. Robust reinforcement learning (RRL) tackles this challenge by optimizing policies against the worst-case scenario defined within an uncertainty set. However, RRL remains impractical due to the cost of the Max-Min optimization, where it suffers from the query complexity for exhaustively finding the worst-case (dubbed β€˜Min’) within the uncertainty set 𝒰, i.e., O(|𝒰|). By viewing this via a lens of quantum perspective, we raise a pivotal question: If we can query from the environment with quantum superpositions, is it possible to accelerate the Max-Min optimization of RRL? Our answer is β€˜Yes’. Our method, called quantum robust inner minimization (QRIM), encodes the uncertainty set with quantum superposition and amplifies low-return cases, thus enabling RL for solving the robust (i.e., worst-case) Bellman equation. Importantly, QRIM achieves a quadratic speed-up in query complexity without altering the outer RL pipeline, i.e., O(√|𝒰|). Validated through classical simulations to real quantum hardware execution, QRIM learns more robust policies for unseen task variations than classical RL methods, while achieving a quadratic reduction in query complexity compared to classical RRL methods.
Primary Area: reinforcement learning
Submission Number: 9532
Loading