In the ZONE: Measuring difficulty and progression in curriculum generation

Rose E Wang; Jesse Mu; Dilip Arumugam; Natasha Jaques; Noah Goodman

In the ZONE: Measuring difficulty and progression in curriculum generation

Rose E Wang, Jesse Mu, Dilip Arumugam, Natasha Jaques, Noah Goodman

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: curriculum learning, multiagent, Bayesian

TL;DR: This work proposes a Bayesian computational framework to operationalize ``the zone of proximal development'' and to improve existing curriculum generation algorithms.

Abstract: A common strategy in curriculum generation for reinforcement learning is to train a teacher network to generate tasks that fall within a student network's ``zone of proximal development'' (ZPD). These are tasks that are not too easy and not too hard for the student. Albeit intuitive, ZPD is not well understood computationally. We propose ZONE, a novel computational framework that operationalizes ZPD. It formalizes ZPD through the language of Bayesian probability theory, revealing that tasks should be selected by difficulty (the student's success probability on the task) and learning progression (the degree of change in the student's model parameters). ZONE operationalizes ZPD with two techniques that we apply on top of existing algorithms. One is REJECT, which rejects tasks outside a difficulty scope and the other is GRAD, which prioritizes tasks that maximize the student's gradient norm. Compared to the original algorithms, the ZONE techniques improve the student’s generalization performance on discrete Minigrid environments and continuous control Mujoco domains with up to $9 \times$ higher success. ZONE also accelerates the student's learning by training on up to $10\times$ less data.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Supplementary Material: zip

15 Replies

Loading