TL;DR: We unified the filtering and control capabilities into a single policy network in RL, achieving SOTA noise robustness and action smoothness in real-world control tasks.
Abstract: Deep reinforcement learning (RL) is effective for decision-making and control tasks like autonomous driving and embodied AI. However, RL policies often suffer from the action fluctuation problem in real-world applications, resulting in severe actuator wear, safety risk, and performance degradation. This paper identifies the two fundamental causes of action fluctuation: observation noise and policy non-smoothness. We propose LipsNet++, a novel policy network with Fourier filter layer and Lipschitz controller layer to separately address both causes. The filter layer incorporates a trainable filter matrix that automatically extracts important frequencies while suppressing noise frequencies in the observations. The controller layer introduces a Jacobian regularization technique to achieve a low Lipschitz constant, ensuring smooth fitting of a policy function. These two layers function analogously to the filter and controller in classical control theory, suggesting that filtering and control capabilities can be seamlessly integrated into a single policy network. Both simulated and real-world experiments demonstrate that LipsNet++ achieves the state-of-the-art noise robustness and action smoothness. The code and videos are publicly available at https://xjsong99.github.io/LipsNet_v2.
Lay Summary: Deep reinforcement learning (RL) systems used in autonomous vehicles and robots often produce unstable, jittery actions due to noisy sensor data and overly sensitive decision-making algorithms, causing accelerated hardware wear and safety risks. In response, we introduce LipsNet++, a unified policy architecture inspired by classical control theory that embeds an adaptive filtering stage—analogous to noise-canceling headphones removing spurious signals—and a Lipschitz smoothing stage—akin to shock absorbers damping abrupt motions. Experimental validation across simulated and physical platforms shows LipsNet++ substantially reduces action fluctuations compared to standard deep RL policies. By reducing action fluctuation, LipsNet++ enhances the robustness, reliability and lifespan of AI systems operating in complex, unpredictable environments.
Link To Code: https://xjsong99.github.io/LipsNet_v2
Primary Area: Reinforcement Learning->Deep RL
Keywords: Deep Reinforcement Learning, Policy Network Design, Action Fluctuation, Control Smoothness and Robustness
Submission Number: 9968
Loading