Look Ma, No Training! Observation Space Design for Reinforcement Learning

21 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: observation space design, real-world reinforcement learning
TL;DR: We propose using random policies and random rollouts to identify which state features are important without requiring training an RL agent multiple times
Abstract: Many scientific communities agree on the potential of reinforcement learning (RL) agents to solve real-world problems, yet such consensus does not extend to how these agents should be designed. In some practical applications, the increasing literature on RL does not shed light on which RL components work better for a particular problem, they are usually treated just as configuration elements to be reported. One of these components is the choice of observation space, which in some cases entails dealing with tens of thousands of observable features. Choosing a rich yet efficient observation space is key to encoding useful information while limiting the tangible implications of adding extra features. Gaining understanding of feature relevance has already been studied for RL. In comparison to supervised learning, the effect of dependencies across states adds a layer of complexity to the structure of the problem. Many of the proposed methods require training RL agents from scratch several times, which is costly in real-world applications. In this paper we propose a simple and cost-efficient way to find good observation spaces that does not require training. Specifically, we propose leveraging multiple random policies when comparing candidate spaces for the same problem. By conducting rollouts with different random policies for each candidate space, we are able to identify statistically-significant signals that indicate which features are better suited for the application considered. We demonstrate the usefulness of our approach in different RL problems, including Traffic Signal Control. By combining random policy sampling with the Hill Climbing search algorithm, we find observation spaces that use less features and achieve comparable or greater return. Overall, this work suggests a straightforward and inexpensive approach to an important aspect of RL design that is often overlooked and is crucial for applied problems.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3110
Loading