Keywords: reinforcement learning
Abstract: Online deep reinforcement learning (DRL) suffers from sample inefficiency. This inefficiency challenges the training of effective policy models for complex tasks and demands substantial time and computing resources. As trained policy models can be transferred to other applications, protecting their intellectual property (IP) has become a pressing issue. To address this, we need to prevent unauthorized transfers for IP protection while maintaining transferability for future scalability. We propose the first Transfer-Controllable Reinforcement Learning (TCRL) framework. It has two key components: the Environment Randomization module generates unauthorized target-domain environments randomly, and the Transfer-Controllable module trains a policy model using source-domain and these unauthorized target-domain environments. This model resists transfer in unauthorized settings yet remains transferable in authorized ones. We validated the framework's effectiveness across various reinforcement learning environments and algorithms. The policy model trained by our framework is hard to transfer to similar unauthorized target-domain environments, but achieves source-domain-like performance in authorized ones. In the MuJoCo environment, our trained policy model attains 98.78 of the source-domain performance in authorized target-domain environments, and only 50.38 in unauthorized ones.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 24247
Loading