Keywords: reinforcement learning, transfer learning
TL;DR: We show how learning an undo map between MDPs enables efficient policy transfer under linear state space transformations.
Abstract: Transfer learning in reinforcement learning (RL) has shown strong empirical success. In this work, we take a more principled perspective by studying when and how transferring knowledge between MDPs can be provably beneficial. Specifically, we consider the case where there exists a linear undo map between two MDPs (a source and a target), such that applying this map to the target’s state space recovers the source exactly. We propose an algorithm that learns this map via linear regression on state feature statistics gathered from both MDPs, and then uses it to obtain the target policy in a zero-shot manner from the source policy. Theoretically, we show that for linear MDPs, our approach has strictly better sample complexity than learning from scratch. Empirically, we demonstrate that these benefits extend beyond the linear setting: on challenging continuous control tasks, our method achieves significantly improved sample efficiency. Overall, our results highlight how shared structure between tasks can be leveraged to make learning more efficient.
Submission Number: 16
Loading