Quantum Algorithms for Finite-horizon Markov Decision Processes

Bin Luo; Yuwen Huang; Jonathan Allcock; Xiaojun Lin; Shengyu Zhang; John C.S. Lui

Quantum Algorithms for Finite-horizon Markov Decision Processes

Bin Luo, Yuwen Huang, Jonathan Allcock, Xiaojun Lin, Shengyu Zhang, John C.S. Lui

Published: 01 May 2025, Last Modified: 24 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We design quantum algorithms that solve finite-horizon Markov decision process problems more efficiently than classical algorithms and establish quantum lower bounds to rule out the possibility of exponential quantum speedups.

Abstract: In this work, we design quantum algorithms that are more efficient than classical algorithms to solve time-dependent and finite-horizon Markov Decision Processes (MDPs) in two distinct settings: (1) In the exact dynamics setting, where the agent has full knowledge of the environment's dynamics (i.e., transition probabilities), we prove that our **Quantum Value Iteration (QVI)** algorithm **QVI-1** achieves a quadratic speedup in the size of the action space $(A)$ compared with the classical value iteration algorithm for computing the optimal policy ($\pi^{\ast}$) and the optimal V-value function ($V_{0}^{\ast}$). Furthermore, our algorithm **QVI-2** provides an additional speedup in the size of the state space $(S)$ when obtaining near-optimal policies and V-value functions. Both **QVI-1** and **QVI-2** achieve quantum query complexities that provably improve upon classical lower bounds, particularly in their dependences on $S$ and $A$. (2) In the generative model setting, where samples from the environment are accessible in quantum superposition, we prove that our algorithms **QVI-3** and **QVI-4** achieve improvements in sample complexity over the state-of-the-art (SOTA) classical algorithm in terms of $A$, estimation error $(\epsilon)$, and time horizon $(H)$. More importantly, we prove quantum lower bounds to show that **QVI-3** and **QVI-4** are asymptotically optimal, up to logarithmic factors, assuming a constant time horizon.

Lay Summary: Markov Decision Processes (MDPs) help model how agents make decisions over time in uncertain environments—from robots navigating rooms to systems managing resources. However, solving these problems becomes increasingly hard as the number of choices (actions) or situations (states) grows. Our work uses quantum computing to tackle this challenge more efficiently than classical methods. We design four quantum algorithms that solve MDPs faster in two important scenarios: when the agent knows exactly how the world works (exact dynamics model) and when it only learns by interacting with it (generative model). These quantum algorithms offer provable speedups in key parameters, including the number of actions, states, total time horizon, and the desired solution accuracy. Our theoretical analysis shows that our quantum speedups are near the best possible, and that exponential quantum speedups are impossible. Our results provide a step toward practical quantum speedups for reinforcement learning and stochastic control problems.

Primary Area: Reinforcement Learning

Keywords: Quantum algorithms, Markov decision processes, lower bounds

Submission Number: 459

Loading