Keywords: Continual Learning, Symmetry, Manipulation
TL;DR: We propose a task-agnostic vision-based continual RL algorithm that grows a policy for each task group that contains equivariant tasks, instead of a single task, and automatically detects task group delineations in an unsupervised manner.
Abstract: Continual reinforcement learning aims to sequentially learn a variety of tasks, retaining the ability to perform previously encountered tasks while simultaneously developing new policies for novel tasks. However, current continual RL approaches overlook the fact that certain tasks are identical under basic group operations like rotations or translations, especially with visual inputs. They may unnecessarily learn and maintain a new policy for each similar task, leading to poor sample efficiency and weak generalization capability. To address this, we introduce a unique Continual Vision-based Reinforcement Learning method that recognizes Group Symmetries, called COVERS, cultivating a policy for each group of equivalent tasks rather than an individual task. COVERS employs a proximal-policy-gradient-based (PPO-based) algorithm to train each policy, which contains an equivariant feature extractor and takes inputs with different modalities, including image observations and robot proprioceptive states. It also utilizes an unsupervised task grouping mechanism that relies on 1-Wasserstein distance on the extracted invariant features. We evaluate COVERS on a sequence of table-top manipulation tasks in simulation and on a real robot platform. Our results show that COVERS accurately assigns tasks to their respective groups and significantly outperforms baselines by generalizing to unseen but equivariant tasks in seen task groups. Demos are available on our project page: https://sites.google.com/view/rl-covers/.
Student First Author: yes
Supplementary Material: zip
Instructions: I have read the instructions for authors (https://corl2023.org/instructions-for-authors/)
Publication Agreement: pdf