Abstract: The domain transfer problem with large dynamics shift commonly exists when using offline reinforcement learning (RL) in real-world applications, where the source dataset collected from one domain needs to be reused to accelerate training the target domain agent with offline RL. The large dynamics shift issue arises when there are unpredictable changes in the target domain’s environment. Existing works typically assume that each state-action pair in the target domain should be at least covered in the source domain, which is often unrealistic and limited to small dynamics shift transfers. To tackle the large dynamics shift problem, we propose to use the source domain data not only for offline policy training but also for safe and efficient data collection in the target domain, thus relaxing the above requirement. Specifically, the source data will play two roles, one is to serve as augmentation data by compensating for the difference in dynamics with modified reward. Another is to form prior knowledge for the behaviour policy to collect a small amount of new data in the target domain safely and efficiently. The target domain policy is trained using offline RL with the source data and modest amounts of newly collected target data. We justify our method in gridworld and autonomous driving environments. Results show that our method requires fewer target domain data and collecting the data in a safer manner compared with prior methods.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)