CurrMask: Learning Versatile Skills with Automatic Masking Curricula

Zhihui Xie; Yao Tang; Zichuan Lin; Deheng Ye; Shuai Li

CurrMask: Learning Versatile Skills with Automatic Masking Curricula

Zhihui Xie, Yao Tang, Zichuan Lin, Deheng Ye, Shuai Li

19 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: reinforcement learning, unsupervised pretraining, masked prediction, curriculum learning

TL;DR: We propose a curriculum masked prediction approach for unsupervised RL pretraining that is effective in learning versatile and reusable skills.

Abstract: Recent research in reinforcement learning (RL) has shown a growing trend towards the pretraining paradigm, where a unified model pretrained on diverse and unlabeled data can be quickly adapted to various downstream tasks. Inspired by advances in other domains, masked prediction provides a generic abstraction for pretraining on decision-making data by masking part of the trajectory and predicting the missing inputs. In spite of the versatility of masked prediction, it remains unclear how to balance the learning of reusable skills at different levels of complexity. To this end, we propose CurrMask, a curriculum masking approach that adjusts its masking scheme for learning diverse and versatile skills. The main idea behind CurrMask is that using masking schemes with different block sizes and mask ratios creates varying levels of temporal granularity. By explicitly combining them in a meaningful order, CurrMask can better capture both local dynamics and global dependencies. To achieve this, CurrMask uses a multi-armed bandit algorithm to find a proper curriculum for masking schemes that maximizes overall learning progress during training. Through extensive experiments, we show that CurrMask exhibits superior finetuning performance on offline RL tasks and zero-shot performance on goal-conditioned planning and skill prompting tasks. Additionally, our analysis reveals that CurrMask gradually increases the complexity of masking scheme, encouraging the model to capture both short-term and long-term dependencies.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1810

Loading