Dynamic Contrastive Reinforcement Learning for Adaptive Code-Text Alignment via Multi-Modal Fusion

Dynamic Contrastive Reinforcement Learning for Adaptive Code-Text Alignment via Multi-Modal Fusion

ICLR 2026 Conference Submission25491 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Modal Fusion

Abstract: We propose Dynamic Contrastive Reinforcement Learning (DCRL), a new structure for end-to-end adaptive code-text alignment with a multi-modal fusion. The proposed method overcomes the shortcomings of static fusion methods by dynamically tuning contrastive learning parameters depending on the reinforcement learning agent's performance, and thus guarantees the quality of alignment is proportional to the proficiency of the task. Unlike conventional methods with 'fix margin' and 'fix temperature' against the contrastive loss, DCRL re-constructs the parameters of margin and temperature as a function of the cumulative reward of the agent and the rate of completion of the tasks, allowing the embedding space to learn out of broadly exploring and then pinpoint alignment. The framework incorporates a cross modal transformer which helps you fuse the embeddings of codes and text and further feed it into a policy network for downstream tasks such as code generation or text summarization.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 25491

Loading