Disentangled Code Embedding for Multi-Task Reinforcement Learning: A Dual-Encoder Approach with Dynamic Gating

ICLR 2026 Conference Submission25534 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Dynamic Gating
Abstract: We propose a disentangled code embedding module (DCEM) for Multi-task reinforcement learning (RL), which explicitly separates task-agnostic and task-specific features in code representations, to achieve better generalization on diverse tasks. The module makes use of a dual encoder architecture, which uses a transformer-based task-agnostic encoder that captures universal programming patterns and a graph neural network that retrieves task-specific features from abstract syntax trees. A dynamic gating mechanism then dynamically combines these features depending on the context of the task, effectively boosting the RL agent to balance shared and specialized knowledge. The combination of Space by DCEM, and RL policy and value networks, enables the agent to base its decisions upon structured code embeddings, which is more conducive to task-aware decision making. Moreover, the above module is pre-trained with the contrastive and reconstruction losses to ensure the strong feature extraction process before fine-tuning with RL objective. Our approach overcomes the problem of catastrophic interference in multi-task RL by disentangling and recombining code features at run time, and contrasting it with past work that tends to use monolithic embeddings. Experiments show that DCEM leads to significant improvement of the performance in cross-task generalization with computational efficiency. The proposed approach represents a principled solution for taking advantage of structured code representations in RL, which may be useful in the context of automated programming assistants, remote robot control and other areas that require adaptive task understanding.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 25534
Loading