- Keywords: Curriculum Learning, Reinforcement Learning, Self-Paced Learning
- Abstract: Curriculum reinforcement learning (CRL) allows to solve complex tasks by generating a tailored sequence of learning tasks, starting from easy ones and subsequently increasing their difficulty. However, the generation of such task sequences is largely governed by application assumptions, often preventing a theoretical investigation of existing approaches. Recently, Klink et al. (2021) showed how self-paced learning induces a principled interpolation between task distributions in the context of RL, resulting in high learning performance. So far, this interpolation is unfortunately limited to Gaussian distributions. Here, we show that on one side, this parametric restriction is insufficient in many learning cases but that on the other, the interpolation of self-paced RL (SPRL) can be degenerate when not restricted to this parametric form. We show that the introduction of concepts from optimal transport into SPRL prevents aforementioned issues. Experiments demonstrate that the resulting introduction of metric structure into the curriculum allows for a well-behaving non-parametric version of SPRL that leads to stable learning performance across tasks.
- One-sentence Summary: We investigate Self-Paced Reinforcement Learning (SPRL) (a Curriculum Reinforcement Learning (CRL) algorithm), showing that its theoretical definition can be unsuited for CRL and taking a look at a potential solution for this problem.
- Supplementary Material: zip