Keywords: Energy efficient locomotion, Reinforcement Learning
TL;DR: We propose a simple, hyperparameter-free gradient optimization methods that minimizes energy while maintaining task performance for locomotion.
Abstract: Efficient robot locomotion often requires balancing task performance with energy expenditure. A common approach in reinforcement learning (RL) is to penalize energy use directly in the reward function. This requires carefully weighting the reward terms to avoid undesirable trade-offs where energy minimization harms task success or vice versa. In this work, we propose a hyperparameter-free gradient optimization method to minimize energy without conflicting with task performance. Inspired by recent works in multitask learning, our method applies policy gradient projection between task and energy objectives to promote non-conflicting updates. We evaluate this technique on standard locomotion benchmarks of DM-Control and HumanoidBench and demonstrate a reduction of $64$% energy usage while maintaining comparable task performance. Further, we conduct experiments on a Unitree GO2 quadruped showcasing Sim2Real transfer of energy efficient policies. Our method is easy to implement in standard RL pipelines with minimal code changes, and offers a principled alternative to reward shaping for energy efficient control policies.
Submission Number: 995
Loading