Revealing Dominant Eigendirections via Spectral Non-Robustness Analysis in the Deep Reinforcement Learning Policy ManifoldDownload PDF


22 Sept 2022, 12:38 (modified: 26 Oct 2022, 14:14)ICLR 2023 Conference Blind SubmissionReaders: Everyone
Abstract: Deep neural policies have recently been installed in a diverse set of settings, from biotechnology to automated financial systems. However, the utilization of deep neural networks to approximate the state-action value function commences concerns on the decision boundary stability, in particular, with regard to the sensitivity of policy decision making to indiscernible, non-robust features due to highly non-convex and complex deep neural manifolds. These concerns constitute an obstruction to understanding the reasoning made by deep neural policies, and their foundational limitations. Thus, it is crucial to develop techniques that aim to understand the sensitivities in the learnt representations of neural network policies. To achieve this we introduce a method that identifies the dominant eigen-directions via spectral analysis of non-robust directions in the deep neural policy decision boundary across both time and space. Through experiments in the Arcade Learning Environment (ALE), we demonstrate the effectiveness of our spectral analysis algorithm for identifying correlated non-robust directions, and for measuring how sample shifts remold the set of sensitive directions in the neural policy landscape. Most importantly, we show that state-of-the-art adversarial training techniques yield learning of sparser high-sensitivity directions, with dramatically larger oscillations over time, when compared to standard training. We believe our results reveal the fundamental properties of the decision process made by the deep reinforcement learning policies, and can help in constructing safe, reliable and value-aligned deep neural policies.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)
5 Replies