Dynamic Task-Embedded Reward Machines for \\ Adaptive Code Generation and Manipulation \\ in Reinforcement Learning
Keywords: Reinforcement Learning
Abstract: We introduce Dynamic Task-Embedded Reward Machine (DTERM), a new machine learning approach for reinforcement learning on tasks of code generation and code manipulation. Conventional reward models tend to be based on fixed weightings or manual tuning, which is not flexible enough for many different coding tasks, such as translation, completion and repair. To overcome that, DTERM dynamically modulates reward components using a hypernetwork-driven architecture, which can balance the task-aware configuration of syntactic correctness, semantic correctness, and computational efficiency. The framework combines three key modules, including a transformer-based task embedding generator, a modular reward decomposer, and a hypernetwork to generate context-dependent weights of sub-rewards.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 25449
Loading