Hierarchical Orchestra of Policies

Thomas P Cannon; Özgür Şimşek

Hierarchical Orchestra of Policies

Thomas P Cannon, Özgür Şimşek

Published: 09 Oct 2024, Last Modified: 02 Dec 2024NeurIPS 2024 Workshop IMOL PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Tiny paper track

Keywords: Continual Learning, Reinforcement Learning, Mitigating Catastrophic Forgetting, Hierarchical Reinforcement Learning, Lifelong Learning

TL;DR: A method to mitigate catastrophic forgetting in deep reinforcement learning by hierarchically orchestrating snapshots of earlier policies is introduced.

Abstract: Continual reinforcement learning poses a major challenge due to the tendency of agents to experience catastrophic forgetting when learning sequential tasks. In this paper, we introduce a modularity-based approach, called Hierarchical Orchestra of Policies (HOP), designed to mitigate catastrophic forgetting in lifelong reinforcement learning. HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks. Unlike other state-of-the-art methods, HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous. Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks and performs comparably to state-of-the-art transfer methods that require task labelling. Moreover, HOP achieves this without compromising performance when tasks remain constant, highlighting its versatility.

Submission Number: 9

Loading