Reward-based Autonomous Online Learning Framework for Resilient Cooperative Target Monitoring using a Swarm of Robots

Shubhankar Gupta; Saksham Sharma; Suresh Sundaram

Reward-based Autonomous Online Learning Framework for Resilient Cooperative Target Monitoring using a Swarm of Robots

Shubhankar Gupta, Saksham Sharma, Suresh Sundaram

Published: 01 Jan 2025, Last Modified: 01 Jan 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: This paper addresses the problem of decentralized cooperative monitoring of an agile target using a swarm of robots undergoing dynamic sensor failures. Each robot is equipped with a proprioceptive sensor suite for the estimation of its own pose and an exteroceptive sensor suite for target detection and position estimation with a limited field of view. Further, the robots use broadcast-based communication modules with a limited communication radius and bandwidth. The uncertainty in the system and the environment can lead to intermittent communication link drops, target visual loss, and large biases in the sensors’ estimation output due to temporary or permanent failures. Robotic swarms often operate without leaders, supervisors, or landmarks, i.e., without the availability of ground truth regarding pose information. In such scenarios, each robot is required to exhibit autonomous learning by taking charge of its own learning process while making the most out of available information. In this regard, a novel Autonomous Online Learning (AOL) framework has been proposed, in which a decentralized online learning mechanism driven by reward-like signals, is intertwined with an implicit adaptive consensus-based, two-layered, weighted information fusion process that utilizes the robots’ observations and their shared information, thereby ensuring resilience in the robotic swarm. In order to study the effect of loss or reward design in the local and social learning layers, three AOL variants are presented. A novel perturbation-greedy reward design is introduced in the learning layers of two variants, leading to exploration-exploitation in their information fusion's weights' space. Convergence analysis of the weights is carried out, showing that the weights converge under reasonable assumptions. Simulation results show that the AOL variant using the perturbation-greedy reward in its local learning layer performs the best, doing $182.2\%$ to $652\%$ and $94.7\%$ to $150.4\%$ better than the baselines in terms of detection score and closeness score per robot, respectively, as the total number of robots is increased from $5$ to $30$. Further, AOL's Sim2Real implementation has been validated using a ROS-Gazebo setup.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: Based on the requested changes, the authors made the following additions/modifications to the revised version of the paper. After de-anonymizing the paper, some additions/modifications made in the revised version shifted by one page in the final camera-ready version. 1. Truncated Fig.4 (previously Fig.3, now Fig.4 on page 16) and added a systems diagram (now Fig.3 on page 7). 2. Added a comparison table comparing AOL variants with KCF and ACF (Table-1 on page 17). 3. Added pseudo-code for all three AOL variants in the appendix subsection 3 (pages 26, 27, 28). 4. Added a discussion regarding the generalizability of the AOL framework in the appendix section 2 (pages 25-26). 5. Added clarification regarding "without the loss of generality" and how the analysis relates to the case with periodic reset in the theoretical analysis section (now page 10, section 4). 6. Added further clarifications regarding the generalizability and practicality of assumptions 2, 3, and 5 in remarks 2, 3, and 6 on pages 11, 12, and 14, respectively. 7. Added Broader Impact Statement and Acknowledgments on page 20.

Video: https://youtu.be/FOMs3Rk6PM4

Code: https://github.com/airl-iisc/AOL

Supplementary Material: zip

Assigned Action Editor: ~Aleksandra_Faust1

Submission Number: 2264

Loading