Visual Navigation for Biped Humanoid Robots Using Deep Reinforcement Learning

Kenzo Lobos-Tsunekawa; Francisco Leiva; Javier Ruiz-del-Solar

Visual Navigation for Biped Humanoid Robots Using Deep Reinforcement Learning

Kenzo Lobos-Tsunekawa, Francisco Leiva, Javier Ruiz-del-Solar

Published: 01 Jan 2018, Last Modified: 27 Sept 2024IEEE Robotics Autom. Lett. 2018EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this letter, we propose a map-less visual navigation system for biped humanoid robots, which extracts information from color images to derive motion commands using deep reinforcement learning (DRL). The map-less visual navigation policy is trained using the Deep Deterministic Policy Gradients (DDPG) algorithm, which corresponds to an actor-critic DRL algorithm. The algorithm is implemented using two separate networks, one for the actor and one for the critic, but with similar structures. In addition to convolutional and fully connected layers, Long Short-Term Memory (LSTM) layers are included to address the limited observability present in the problem. As a proof of concept, we consider the case of robotic soccer using humanoid NAO V5 robots, which have reduced computational capabilities, and low-cost Red - Green - Blue (RGB) cameras as main sensors. The use of DRL allowed to obtain a complex and high performant policy from scratch, without any prior knowledge of the domain, or the dynamics involved. The visual navigation policy is trained in a robotic simulator and then successfully transferred to a physical robot, where it is able to run in 20 ms, allowing its use in real-time applications.

Loading