Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization

Abdullah Akgül; Gulcin Baykal; Manuel Haussmann; Melih Kandemir

Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization

Abdullah Akgül, Gulcin Baykal, Manuel Haussmann, Melih Kandemir

Published: 05 Nov 2025, Last Modified: 05 Nov 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Authors that are also TMLR Expert Reviewers: ~Manuel_Haussmann1

Abstract: Continuous control of non-stationary environments is a major challenge for deep reinforcement learning algorithms. The time-dependency of the state transition dynamics aggravates the notorious stability problems of model-free deep actor-critic architectures. We posit that two properties will play a key role in overcoming non-stationarity in transition dynamics: (i)~preserving the plasticity of the critic network and (ii) directed exploration for rapid adaptation to changing dynamics. We show that performing on-policy reinforcement learning with an evidential critic provides both. The evidential design ensures a fast and accurate approximation of the uncertainty around the state value, which maintains the plasticity of the critic network by detecting the distributional shifts caused by changes in dynamics. The probabilistic critic also makes the actor training objective a random variable, enabling the use of directed exploration approaches as a by-product. We name the resulting algorithm \emph{Evidential Proximal Policy Optimization (EPPO)} due to the integral role of evidential uncertainty quantification in both policy evaluation and policy improvement stages. Through experiments on non-stationary continuous control tasks, where the environment dynamics change at regular intervals, we demonstrate that our algorithm outperforms state-of-the-art on-policy reinforcement learning variants in both task-specific and overall return.

Certifications: Expert Certification

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/adinlab/EPPO

Supplementary Material: zip

Assigned Action Editor: ~Pablo_Samuel_Castro1

Submission Number: 5551

Loading