ChaosEater: Fully Automating Chaos Engineering with Large Language Models

Daisuke Kikuta; Hiroki Ikeuchi; Kengo Tajiri; Yuusuke Nakano

ChaosEater: Fully Automating Chaos Engineering with Large Language Models

Daisuke Kikuta, Hiroki Ikeuchi, Kengo Tajiri, Yuusuke Nakano

25 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Chaos Engineering, Software Engineering, Infrastructure as Code, Large Language Models

TL;DR: We propose an LLM-based system for fully automating the Chaos Engineering cycle. our system significantly reduces both time and monetary costs while completing a reasonable CE cycle.

Abstract: Chaos Engineering (CE) is an engineering technique aimed at improving the resiliency of distributed systems. It involves artificially injecting specific failures into a distributed system and observing its behavior in response. Based on the observation, the system can be proactively improved to handle those failures. Recent CE tools realize the automated execution of predefined CE experiments. However, defining these experiments and reconfiguring the system after the experiments still remain manual. To reduce the costs of the manual operations, we propose ChaosEater, a "system" for automating the entire CE operations with Large Language Models (LLMs). It pre-defines the general flow according to the systematic CE cycle and assigns subdivided operations within the flow to LLMs. We assume systems based on Infrastructure as Code (IaC), wherein the system configurations and artificial failures are managed through code. Hence, the LLMs' operations in our "system" correspond to software engineering tasks, including requirement definition, code generation and debugging, and testing. We validate our "system" through case studies on both small and large systems. The results demonstrate that our "system" significantly reduces both time and monetary costs while completing a reasonable CE cycle. Our code is available in the Supplementary Material.

Supplementary Material: zip

Primary Area: infrastructure, software libraries, hardware, systems, etc.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5294

Loading