Transfer-Controllable Policy for Model Protection in Deep Reinforcement Learning

Ruizhi Chen; Yu Zheng; Xinguo Zhu; Huang Qian; Zikang Tian; KE GAO; Qirui Zhou; Shaohui Peng; Yunji Chen; Ling Li

Transfer-Controllable Policy for Model Protection in Deep Reinforcement Learning

Ruizhi Chen, Yu Zheng, Xinguo Zhu, Huang Qian, Zikang Tian, KE GAO, Qirui Zhou, Shaohui Peng, Yunji Chen, Ling Li

20 Sept 2025 (modified: 05 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning

Abstract: Online deep reinforcement learning (DRL) suffers from sample inefficiency. This inefficiency challenges the training of effective policy models for complex tasks and demands substantial time and computing resources. As trained policy models can be transferred to other applications, protecting their intellectual property (IP) has become a pressing issue. To address this, we need to prevent unauthorized transfers for IP protection while maintaining transferability for future scalability. We propose the first Transfer-Controllable Reinforcement Learning (TCRL) framework. It has two key components: the Environment Randomization module generates unauthorized target-domain environments randomly, and the Transfer-Controllable module trains a policy model using source-domain and these unauthorized target-domain environments. This model resists transfer in unauthorized settings yet remains transferable in authorized ones. We validated the framework's effectiveness across various reinforcement learning environments and algorithms. The policy model trained by our framework is hard to transfer to similar unauthorized target-domain environments, but achieves source-domain-like performance in authorized ones. In the MuJoCo environment, our trained policy model attains 98.78 of the source-domain performance in authorized target-domain environments, and only 50.38 in unauthorized ones.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 24247

Loading