Flowing Through States: Neural ODE Regularization for Reinforcement Learning

ICLR 2026 Conference Submission21483 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Neural ODE, Reinforcement Learning, MDP, Regularization, Actor-Critic, A2C
TL;DR: We regularize reinforcement learning by modeling latent state transitions in MDPs as neural ODE flows, leading to improved stability and performance across standard benchmarks.
Abstract: Neural networks applied to sequential decision-making tasks typically rely on latent representations of environment states. While environment dynamics dictate how semantic states evolve, the corresponding latent transitions are usually left implicit, leaving room for misalignment between the two. We propose to model latent dynamics explicitly by drawing an analogy between Markov decision process (MDP) trajectories and ordinary differential equation (ODE) flows: in both cases, the current state fully determines its successors. Building on this view, we introduce a neural ODE–based regularization method that enforces latent embeddings to follow consistent ODE flows, thereby aligning representation learning with environment dynamics. Although broadly applicable to deep learning agents, we demonstrate its effectiveness in reinforcement learning by integrating it into an Actor–Critic algorithm, where it results in major performance gains across various standard Atari benchmarks.
Primary Area: reinforcement learning
Submission Number: 21483
Loading