ODE-based Smoothing Neural Network for Reinforcement Learning Tasks

Yinuo Wang; Wenxuan Wang; Xujie Song; Tong Liu; Yuming Yin; Liangfa Chen; Likun Wang; Jingliang Duan; Shengbo Eben Li

ODE-based Smoothing Neural Network for Reinforcement Learning Tasks

Yinuo Wang, Wenxuan Wang, Xujie Song, Tong Liu, Yuming Yin, Liangfa Chen, Likun Wang, Jingliang Duan, Shengbo Eben Li

Published: 22 Jan 2025, Last Modified: 01 May 2025ICLR 2025 SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Smooth Control, Low-pass Filter, Neural ODE

TL;DR: This paper proposes a neural unit structure with smooth properties, and based on it, proposes a smoothing policy neural network. This work is the first time that Neural ODE method is used for action smoothing in deep reinforcement learning.

Abstract: The smoothness of control actions is a significant challenge faced by deep reinforcement learning (RL) techniques in solving optimal control problems. Existing RL-trained policies tend to produce non-smooth actions due to high-frequency input noise and unconstrained Lipschitz constants in neural networks. This article presents a Smooth ODE (SmODE) network capable of simultaneously addressing both causes of unsmooth control actions, thereby enhancing policy performance and robustness under noise condition. We first design a smooth ODE neuron with first-order low-pass filtering expression, which can dynamically filter out high frequency noises of hidden state by a learnable state-based system time constant. Additionally, we construct a state-based mapping function, $g$, and theoretically demonstrate its capacity to control the ODE neuron's Lipschitz constant. Then, based on the above neuronal structure design, we further advanced the SmODE network serving as RL policy approximators. This network is compatible with most existing RL algorithms, offering improved adaptability compared to prior approaches. Various experiments show that our SmODE network demonstrates superior anti-interference capabilities and smoother action outputs than the multi-layer perception and smooth network architectures like LipsNet.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7391

Loading