Non-ergodicity in reinforcement learning: robustness via ergodic transformations

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Reinforcement learning, Ergodicity, Reward transformation
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Optimizing the long-term performance instead of the expected accumulated reward enables learning of robust policies in non-ergodic environments.
Abstract: Envisioned application areas for reinforcement learning (RL) include autonomous driving, precision agriculture, and finance, which all require RL agents to make decisions in the real world. A significant challenge hindering the adoption of RL methods in these domains is the non-robustness of conventional algorithms. In this paper, we argue that a fundamental issue contributing to this lack of robustness lies in the focus on the expected value of the accumulated reward as the sole “correct” optimization objective. The expected value is the average over the statistical ensemble of infinitely many trajectories. For non-ergodic rewards, this average differs from the average over a single but infinitely long trajectory. Consequently, optimizing the expected value can lead to policies that yield exceptionally high rewards with probability zero but almost surely result in catastrophic outcomes. This problem can be circumvented by transforming the time series of collected rewards into one with ergodic increments. This transformation enables learning robust policies by optimizing the long-term reward for individual agents rather than the average across infinitely many trajectories. We propose an algorithm for learning ergodic transformations from data and demonstrate its effectiveness in an instructive environment with non-ergodic rewards and on standard RL benchmarks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5611
Loading