Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning

ICLR 2026 Conference Submission13387 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Continuous-time, multi-agent reinforcement learning, physics-informed neural networks
TL;DR: This paper leverages physics-informed neural networks combined with value gradient iteration to deal with continuous-time multi-agent reinforcement learning problems.
Abstract: Existing reinforcement learning (RL) methods struggle with complex dynamical systems that demand interactions at high frequencies or irregular time intervals. Continuous-time RL (CTRL) has emerged as a promising alternative by replacing discrete-time Bellman recursion with differentiable value functions defined as viscosity solutions of the Hamilton–Jacobi–Bellman (HJB) equation. While CTRL has shown promise, its applications have been largely limited to the single-agent domain. This limitation stems from two key challenges: (i) conventional methods for solving HJB equations suffer from the curse of dimensionality (CoD), making them intractable in high-dimensional systems; and (ii) even with learning-based approaches to alleviate the CoD, accurately approximating centralized value functions in multi-agent settings remains difficult, which in turn destabilizes policy training. In this paper, we propose a CT-MARL framework that uses physics-informed neural networks (PINNs) to approximate HJB-based value functions at scale. To ensure the value is consistent with its differential structure, we align value learning with value-gradient learning by introducing a Value Gradient Iteration (VGI) module that iteratively refines value gradients along trajectories. This improves gradient accuracy, in turn yielding more precise value approximations and stronger policy learning. We evaluate our method using continuous‑time variants of standard benchmarks, including multi‑agent particle environment (MPE) and multi‑agent MuJoCo. Our results demonstrate that our approach consistently outperforms existing continuous‑time RL baselines and scales to complex cooperative multi-agent dynamics.
Primary Area: reinforcement learning
Submission Number: 13387
Loading