State Chrono Representation for Enhancing Generalization in Reinforcement Learning

23 Sept 2023 (modified: 11 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Deep Reinforcement Learning; Representation Learning; Bisimulation Metric
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Developing a robust and generalizable state representation is essential for overcoming the challenges posed by reinforcement learning tasks that rely on images as input. Recent developments in metric learning, including techniques like deep bisimulation metric approaches, have facilitated the transformation of states into structured representation spaces, allowing the measurement of distances based on task-relevant features. However, these approaches face challenges in handling demanding generalization tasks and scenarios characterized by sparse rewards. Their limited one-step update strategy often falls short of capturing adequate long-term behaviors within their representations. To address these challenges, we present the State Chrono Representation (SCR) approach, which enhances state representations by integrating long-term information alongside the bisimulation metric. SCR learns state distances and measurements within a temporal framework, considering future dynamics and accumulated rewards across current and long-term future states. The resulting representation space not only captures sequential behavioral information but also integrates distances and measurements from the present to the future. This temporal-aware learning strategy does not introduce a significant number of additional parameters for modeling dynamics, ensuring the efficiency of the entire learning process. Comprehensive experiments conducted within DeepMind Control environments reveal that SCR achieves state-of-the-art performance in demanding generalization tasks and scenarios characterized by sparse rewards.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7142
Loading