Diagnosing Robotics Systems Issues with Large Language Models -- A Case Study

Jordis Emilia Herrmann; Aswath Mandakath Gopinath; Mikael Norrlof; Mark Niklas Mueller

Diagnosing Robotics Systems Issues with Large Language Models -- A Case Study

Jordis Emilia Herrmann, Aswath Mandakath Gopinath, Mikael Norrlof, Mark Niklas Mueller

Published: 06 Mar 2025, Last Modified: 24 Mar 2025ICLR 2025 FM-Wild WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Robotics, System Diagnostics, AI Ops, LLMs

TL;DR: We conduct a case study leveraging LLMs for root cause analysis for complex robotics system issues considering a decade of real world industry data.

Abstract: Quickly resolving issues reported in industrial robotics applications is crucial to minimize economic impact. However, the required data analysis makes diagnosing the underlying root causes a challenging and time-consuming task, even for experts. In contrast, large language models (LLMs) excel at quickly analyzing large amounts of data. Indeed, prior work in AI-Ops demonstrates their effectiveness for IT systems. Here, we extend this work to the challenging and largely unexplored domain of robotics systems. To this end, we create SYSDIAGBENCH, an internal system diagnostics dataset for robotics, containing over 2 500 real-world issues. We leverage SYSDIAGBENCH to investigate the performance of LLMs for root cause analysis, considering a range of model sizes and adaptation techniques. Our results show that finetuned 7B-parameter models can outperform frontier models in terms of diagnostic accuracy while being significantly more cost-effective. We validate our LLM-as-a-judge results with a human expert study and find that our best model achieves approval ratings similar to our reference labels.

Submission Number: 21

Loading