Diagnosing Robotics Systems Issues with Large Language Models – A Case Study

Jordis Emilia Herrmann; Aswath Mandakath Gopinath; Mikael Norrlof; Mark Niklas Mueller

Diagnosing Robotics Systems Issues with Large Language Models – A Case Study

Jordis Emilia Herrmann, Aswath Mandakath Gopinath, Mikael Norrlof, Mark Niklas Mueller

Published: 06 Mar 2025, Last Modified: 19 Apr 2025DL4C @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 9 pages)

Keywords: AI Ops, Root Cause Analysis, LLMs, Case Study

TL;DR: We conduct a case study on the application of LLMs for root cause analysis in complex industrial systems using over a decade of real world data.

Abstract: Quickly resolving issues reported in industrial robotics applications is crucial to minimize economic impact. However, the required data analysis makes diagnosing the underlying root causes a challenging and time-consuming task, even for experts. In contrast, large language models (LLMs) excel at quickly analyzing large amounts of data. Indeed, prior work in AI-Ops demonstrates their effectiveness for IT systems. Here, we extend this work to the challenging and largely unexplored domain of robotics systems. To this end, we create SYSDIAGBENCH, an internal system diagnostics dataset for robotics, containing over 2 500 real-world issues. We leverage SYSDIAGBENCH to investigate the performance of LLMs for root cause analysis, considering a range of model sizes and adaptation techniques. Our results show that finetuned 7B-parameter models can outperform frontier models in terms of diagnostic accuracy while being significantly more cost-effective. We validate our LLM-as-a-judge results with a human expert study and find that our best model achieves approval ratings similar to our reference labels.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 11

Loading