Abstract: In curriculum learning, teaching involves cooperative selection of sequences of data via plans to facilitate efficient and effective learning.
One-off cooperative selection of data has been mathematically formalized as entropy-regularized optimal transport and the limiting behavior of myopic sequential interactions has been analyzed, both yielding theoretical and practical guarantees.
We recast sequential cooperation with curriculum planning in a reinforcement learning framework and analyze performance mathematically and by simulation.
We prove that infinite length plans are equivalent to not planning under certain assumptions on the method of planning, and isolate instances where monotonicity and hence convergence in the limit hold, as well as cases where it does not. We also demonstrate through simulations that argmax data selection is the same across planning horizons and demonstrate problem-dependent sensitivity of learning to the teacher's planning horizon. Thus, we find that planning ahead yields efficiency at the cost of effectiveness. This failure of alignment is illustrated in particular with grid world examples in which the teacher must attempt to steer the learner away from a particular location in order to reach the desired grid square. We conclude with implications and directions for efficient and effective curricula.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=tIdBsdyxG3
Changes Since Last Submission: We have added further simulations planning more steps ahead as a previous reviewer indicated would be helpful for estimating the optimal plan (however, we were unable to simulate their suggested number of 50 steps ahead due to the computation time required). We have also fixed a few minor typos, and have clarified and simplified the notation where possible.
Assigned Action Editor: ~Amir-massoud_Farahmand1
Submission Number: 652
Loading