Learning Energy Efficient Trotting For Legged Robots

Athanasios Mastrogeorgiou; Aristotelis Papatheodorou; KONSTANTINOS KOUTSOUKIS; Konstantinos Machairas; Evangelos Papadopoulos

Learning Energy Efficient Trotting For Legged Robots

Athanasios Mastrogeorgiou, Aristotelis Papatheodorou, KONSTANTINOS KOUTSOUKIS, Konstantinos Machairas, Evangelos Papadopoulos

Published: 23 Jun 2025, Last Modified: 23 Jun 2025Greeks in AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI for Science

TL;DR: DRL-based quadrupedal control that promotes stable trotting while minimizing the Cost of Transport.

Abstract: Quadrupedal locomotion remains a complex control challenge, particularly when energy efficiency is considered. Recent advances in Deep Reinforcement Learning (DRL) offer a promising framework for automating the synthesis of low-level controllers from sensory input. In this work, we develop a DRL-based scheme for energy-efficient trotting on the Laelaps II quadruped, simulated in MuJoCo. A high-fidelity model of the robot is constructed, with emphasis on accurately capturing drivetrain dynamics. We design a reward function and action space that promote stable trotting while minimizing the Cost of Transport (CoT). The proposed method initially presented in [1] demonstrates improved energy efficiency during trotting on level terrain, replicating treadmill-like conditions at NTUA’s Control Systems Lab. I. INTRODUCTION Quadrupedal locomotion demands rapid reflexes, coordinated leg control, precise force handling, and robust balance, traditionally requiring extensive manual tuning. Model-free DRL methods effectively learn locomotion directly from experience [2], yet often overlook energy efficiency [3], penalize joint accelerations without considering mechanical antagonism [4], and struggle with coordinated gait patterns like trotting due to increased data needs and invalid configurations [5]. This work investigates energy-efficient trotting for the quadruped Laelaps II [1], focusing on forward locomotion on level terrain with reduced energy consumption. II. METHODOLOGY Initially a highly realistic MuJoCo representation of the Laelaps II quadruped (Fig. 1 in [1]) was created, with all mechanical and electrical properties accurately reflecting the physical system. These properties were obtained from component datasheets and, where necessary, experimentally validated. To enable energy-efficient trotting, the robot’s energy consumption was incorporated into the reward function. The total actuation energy $E_{tot}$ is defined as the sum of the mechanical actuation energy $E_{act}$ (1) and the electrical losses $E_{el}$ (2) computed via integration of the respective power expressions. Numerical integration was performed using the Simpson 1/3 rule over a given time interval $ \Delta t = t_2 - t_1$ and timestep dt. The quantitie $\tau_{m,i}$ denotes the torque, the $\dot{q}$ $\quad_{m,i}$ the angular velocity, the $R_{m,i}$ the winding resistance, and the $K_{T,i}$ the torque constant of the $i_{th}$, motor, respectively. $$E_{act} = \int_{t_1}^{t_2} \sum_{i=1}^{8} |\tau_{m,i} \dot{q}_{m,i}|dt \quad(1)$$ $$E_{el} = \int_{t_1}^{t_2} \sum_{i=1}^{8} \left[\left(\frac{\tau_{m,i}}{K_{T,i}}\right)^2 R_{m,i}\right] dt \quad(2)$$ In the reward function, a simplified version of CoT is used (3) since the robot’s mass (m) and the gravity’s acceleration (g) are constants. Furthermore, the reward in (4) accounts for the distance traversed from the beginning of the episode $\Delta x_{ep}$, not only the one in the current step. Finally, the total reward $rew_{tot}$ (6) computed at each agent timestep, combines a positive term promoting forward progression, a penalty for lateral deviation (5) from the desired trajectory, and an energy-related term based on the Cost of Transport (CoT). $$rew_{en} = - w_{en} \, \frac{E_{tot}}{\Delta x_{ep} + \epsilon} \quad(3)$$ $$rew_{x} = w_x \left( | x_{now} - x_{previous} | \right) \quad(4)$$ $$rew_{y} = -w_y | \; |y_{now}| - |y_{previous}| \quad(5)$$ $$ rew_{tot} = rew_x + rew_y + rew_{en} \quad(6)$$ III. RESULTS Similar approaches have also tried to reduce energy consumption by penalizing joint acceleration or the mechanical part of the actuation power during a gait, but not the drivetrain’s total energy demands. After training and testing them on Laelaps II, our approach still achieved the lowest CoT , i.e., 1.89, (Fig. 8b in [1]). REFERENCES [1] Mastrogeorgiou, A., Papatheodorou, A., Koutsoukis, K., and Papadopoulos, E., “Learning energy-efficient trotting for legged robots,” Robotics in Natural Settings, Lecture Notes in Networks and Systems, v. 530, Springer, 2022, pp. 204-215. [2] X. B. Peng, E. Coumans, T. Zhang, T.-W. E. Lee, J. Tan, and S. Levine, “Learning Agile Robotic Locomotion Skills by Imitating Animals,” in Robotics: Science and Systems, 07 2020. [3] J. Tan et al., Sim-to-Real: Learning Agile Locomotion For Quadruped Robots. arXiv, 2018. doi: 10.48550/ARXIV.1804.10332. [4] Koutsoukis, K. and Papadopoulos, E., “On the Effect of Robotic Leg Design on Energy Efficiency,” Proc. IEEE International Conference on Robotics and Automation (ICRA ’21), Xi’an, China, May 30-June 5, 2021. [5] A. Iscen et al., “Policies Modulating Trajectory Generators,” 2019, arXiv, doi: 10.48550/ARXIV.1910.02812.

Submission Number: 123

Loading