Silent Data Corruption in Robot Operating System: A Case for End-to-End System-Level Fault Analysis Using Autonomous UAVs

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Safety and resiliency are essential components of autonomous vehicles. In this research, we introduce ROSFI, the first robot operating system (ROS) resilience analysis methodology, to assess the effect of silent data corruption (SDC) on mission metrics. We use unmanned aerial vehicles (UAVs) as a case study to demonstrate that system-level parameters, such as flight time and success rate, are necessary for accurately measuring system resilience. We demonstrate that downstream ROS tasks such as planning and control are more susceptible to SDCs than the visual perception stage in the perception–planning–control (PPC) compute pipeline. This observation only becomes apparent when we consider the complete end-to-end system-level pipeline, as opposed to isolated compute kernels, as previous work does. To enhance the safety and robustness of robot systems bound by size, weight, and power (SWaP), we offer two low-overhead anomaly-based SDC detection and recovery algorithms based on Gaussian statistical models and autoencoder neural networks. Our anomaly error protection techniques are validated in numerous simulated environments. We demonstrate that the autoencoder-based technique can recover up to all failure cases in our studied scenarios with a computational overhead of no more than 0.0062%. Finally, our open-source methodology can be utilized to comprehensively test the robustness of other ROS-based applications. It is available for public download at https://github.com/harvard-edge/MAVBench/tree/mavfi .
Loading