Can Agent Learn Robust Locomotion Skills without Modeling Environmental Observation Noise?

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Deep reinforcement learning, self-supervised masked augmentation, locomotion control, de-noising
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Inspired by multisensory integration mechanisms in mammalian brains, this paper improves the robustness of DRL-based locomotion agents against observation noise by learning the correlation of multivariate time series without noise modeling.
Abstract: Deep Reinforcement Learning (DRL) has been widely attempted for solving locomotion control problems recently. Under the circumstances, DRL agents observe environmental measurements via multi-sensor signals, which are usually accompanied by unpredictable noise or errors. Therefore, well-trained policies in simulation are prone to collapse in reality. Existing solutions typically model environmental noise explicitly and perform optimal state estimation based on this. However, there exists non-stationary noise which is intractable to be modeled in real-world tasks. Moreover, these extra noise modeling procedures often induce observable learning efficiency decreases. Since these multi-sensor observation signals are universally correlated in nature, we may use this correlation to recover optimal state estimation from environmental observation noise, and without modeling them explicitly. Inspired by multi-sensory integration mechanism in mammalian brain, a novel Self-supervised randomIzed Masked Augmentation (SIMA) algorithm is proposed. SIMA adopts a self-supervised learning approach to discover the correlation of multivariate time series and reconstruct optimal state representation from disturbed observations latently with a theoretical guarantee. Empirical study reveals that SIMA performs robust locomotion skills under environmental observation noise, and outperforms state-of-the-art baselines by 15.7% in learning performance.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2436
Loading