Keywords: Reinforcement learning, Quantum computing
Abstract: Reinforcement learning (RL) often fails when faced with unexpected environmental changes that were unseen during training. Robust reinforcement learning (RRL) tackles this challenge by optimizing policies against the worst-case scenario defined within an uncertainty set. However, RRL remains impractical due to the cost of the Max-Min optimization, where it suffers from the query complexity for exhaustively finding the worst-case (dubbed βMinβ) within the uncertainty set π°, i.e., O(|π°|). By viewing this via a lens of quantum perspective, we raise a pivotal question: If we can query from the environment with quantum superpositions, is it possible to accelerate the Max-Min optimization of RRL? Our answer is βYesβ. Our method, called quantum robust inner minimization (QRIM), encodes the uncertainty set with quantum superposition and amplifies low-return cases, thus enabling RL for solving the robust (i.e., worst-case) Bellman equation. Importantly, QRIM achieves a quadratic speed-up in query complexity without altering the outer RL pipeline, i.e., O(β|π°|). Validated through classical simulations to real quantum hardware execution, QRIM learns more robust policies for unseen task variations than classical RL methods, while achieving a quadratic reduction in query complexity compared to classical RRL methods.
Primary Area: reinforcement learning
Submission Number: 9532
Loading