Universal Approximation Theorem of Deep Q-Networks

Qian Qi

Universal Approximation Theorem of Deep Q-Networks

Qian Qi

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This paper investigates the theoretical capabilities of Deep Q-Networks (DQNs) in approximating optimal Q-functions within the framework of reinforcement learning.

Abstract: We establish a continuous-time framework for analyzing Deep Q-Networks (DQNs) via stochastic control and Forward-Backward Stochastic Differential Equations (FBSDEs). Considering a continuous-time Markov Decision Process (MDP) driven by a square-integrable martingale, we analyze DQN approximation properties. We show that DQNs can approximate the optimal Q-function on compact sets with arbitrary accuracy and high probability, leveraging residual network approximation theorems and large deviation bounds for the state-action process. We then analyze the convergence of a general Q-learning algorithm for training DQNs in this setting, adapting stochastic approximation theorems. Our analysis emphasizes the interplay between DQN layer count, time discretization, and the role of viscosity solutions (primarily for the value function $V^*$) in addressing potential non-smoothness of the optimal Q-function. This work bridges deep reinforcement learning and stochastic control, offering insights into DQNs in continuous-time settings, relevant for applications with physical systems or high-frequency data.

Lay Summary: Imagine teaching a computer to make smart decisions in situations that change smoothly over time, like guiding a self-driving car through flowing traffic or managing a complex power grid. Many current AI learning methods work well for situations with distinct steps, like a board game. Our research explores how a popular AI technique, called Deep Q-Networks (DQNs), can learn in these more fluid, continuous environments. We show two key things. First, these DQNs are theoretically powerful enough to learn the "best possible moves" (or, more technically, figure out the optimal decision-making strategy) with high accuracy, no matter how complicated the continuous task is. Second, we demonstrate that the common ways these DQNs are trained will, under the right conditions, actually lead them to correctly learn these best moves. To achieve this, we've developed a new mathematical foundation that connects these AI learning methods with established theories from control engineering. This work helps build a more solid understanding of how DQNs behave and learn in real-world scenarios that don't just jump from one step to the next, paving the way for more reliable and effective AI in dynamic systems.

Primary Area: Theory->Reinforcement Learning and Planning

Keywords: Reinforcement Learning, Deep Q-Networks, Q-function Approximation, Universal Approximation Theorem, Function Approximation, Neural Networks, Markov Decision Processes, Bellman Equation, Residual Networks, Representation Learning

Submission Number: 11377

Loading