Benchmarking Smoothness and Reducing High-Frequency Oscillations in Continuous Control Policies

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: applications to robotics, autonomy, planning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: reinforcement learning, smoothness, benchmark
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Reinforcement learning (RL) policies are prone to high frequency oscillations, specially undesirable when deploying to hardware in the real-world. In this paper, we identify, categorize, and compare methods from the literature that aim to mitigate high frequency oscillations in RL. We define two broad classes: loss regularization and architectural methods. At their core, they incentivize learning a smooth mapping, such that nearby states in the input space produce nearby actions in the output space. We present benchmarks in terms of policy performance and smoothness with staple RL environments from Gymnasium, as well as two robotics locomotion tasks that include deployment and evaluations in the real-world. Finally, we also propose hybrid methods that combine elements from both loss regularization and architectural methods, and outperform the existing approaches in the simulation benchmarks as well as in the real-world.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6989
Loading