Dynamic Task-Embedded Reward Machines for \\ Adaptive Code Generation and Manipulation \\ in Reinforcement Learning

Dynamic Task-Embedded Reward Machines for \\ Adaptive Code Generation and Manipulation \\ in Reinforcement Learning

ICLR 2026 Conference Submission25449 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning

Abstract: We introduce Dynamic Task-Embedded Reward Machine (DTERM), a new machine learning approach for reinforcement learning on tasks of code generation and code manipulation. Conventional reward models tend to be based on fixed weightings or manual tuning, which is not flexible enough for many different coding tasks, such as translation, completion and repair. To overcome that, DTERM dynamically modulates reward components using a hypernetwork-driven architecture, which can balance the task-aware configuration of syntactic correctness, semantic correctness, and computational efficiency. The framework combines three key modules, including a transformer-based task embedding generator, a modular reward decomposer, and a hypernetwork to generate context-dependent weights of sub-rewards.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 25449

Loading