Mitigating Conflicts in Multi-Task Reinforcement Learning via Progressively-Trained Dynamic Policy Network

Mitigating Conflicts in Multi-Task Reinforcement Learning via Progressively-Trained Dynamic Policy Network

ICLR 2026 Conference Submission15080 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, continual learning, multi-task learning

Abstract: Reinforcement learning is widely applied in various fields, including game playing, robotic control and autonomous driving. However, we find that, when trained for multi-tasking where there exist inter-task conflicts, the standard reinforcement learning algorithm may yield limited performance on individual tasks. To mitigate this, we introduce a dynamic policy network that incorporates diverse computational pathways of varying depths, along with gating modules that selectively activate the appropriate pathways for different tasks. This design, equipped with better flexibility, allows the network to achieve improved multi-task performance. Second, we propose a progressive training technique to mitigate the conflicts among tasks by leveraging proper training order and continual learning techniques. Using the dynamic policy network design and the progressive training technique, we successfully trained a policy capable of performing seven quadrupedal locomotion tasks and a policy that achieved an improved final average reward on ten MiniHack games.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 15080

Loading