XBG: End-to-End Imitation Learning for Autonomous Behaviour in Human-Robot Interaction and Collaboration

Carlos Cardenas-Perez, Giulio Romualdi, Mohamed Elobaid, Stefano Dafarra, Giuseppe L'Erario, Silvio Traversaro, Pietro Morerio, Alessio Del Bue, Daniele Pucci

Published: 2024, Last Modified: 15 May 2025IEEE Robotics Autom. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This letter presents XBG (eXteroceptive Behaviour Generation), a multimodal end-to-end Imitation Learning (IL) system for whole-body autonomous humanoid robots used in real-world Human-Robot Interaction (HRI) scenarios. The main contribution is an architecture for learning HRI behaviours using a data-driven approach. A diverse dataset is collected via teleoperation, covering multiple HRI scenarios, such as handshaking, handwaving, payload reception, walking, and walking with a payload. After synchronizing, filtering, and transforming the data, we show how to train the presented Deep Neural Networks (DNN), integrating exteroceptive and proprioceptive information to help the robot understand both its environment and its actions. The robot takes in sequences of images (RGB and depth) and joints state information to react accordingly. By fusing multimodal signals over time, the model enables autonomous capabilities in a robotic platform. The models are evaluated based on the success rates in the mentioned HRI scenarios and they are deployed on the ergoCub humanoid robot. XBG achieves success rates between 60% and 100% even when tested in unseen environments.