Learning Achievement Structure for Structured Exploration in Domains with Sparse RewardDownload PDF


22 Sept 2022, 12:41 (modified: 16 Nov 2022, 05:40)ICLR 2023 Conference Blind SubmissionReaders: Everyone
Keywords: deep reinforcement learning, structured exploration
Abstract: We propose Structured Exploration with Achievements (SEA), a multi-stage reinforcement learning algorithm that learns the environment structure with offline data and uses the learned structure to learn different skills and improve overall exploration with online environment interactions in a particular type of environment that has an internal achievement system. SEA first uses a contrast-based loss function to learn the achievement representations and build an achievement classifier. It then tries to recover the environment achievement structure with a heuristic algorithm. Finally, SEA builds a meta-controller with the recovered structure to learn sub-policies and explore new tasks. While exploration in a procedurally generated environment with high-dimensional input like images is extremely hard for reinforcement learning agents, we demonstrate that SEA is still able to recover the underlying structure and explore new tasks in different domains.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
17 Replies