Licence to Scale: A Microservice Simulation Environment for Benchmarking Agentic AI

Christopher Lohse; Adrian Selk; Amadou Ba; Jonas Wahl; Marco Ruffini

Licence to Scale: A Microservice Simulation Environment for Benchmarking Agentic AI

Christopher Lohse, Adrian Selk, Amadou Ba, Jonas Wahl, Marco Ruffini

Published: 28 Sept 2025, Last Modified: 09 Oct 2025SEA @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent Evaluation, Causality, Simulation

TL;DR: In response to the need for more domain-specific multi-agent evaluation, we propose a simulation environment inspired by causality for assessing agents in a cloud microservice autoscaling use case.

Abstract: Recent advances in large language models (LLMs) have enabled the development of intelligent agents with reasoning and planning capabilities. However, there are two key limitations: the lack of realistic doamain specific models that capture the causal system dynamics in which these agents operate and the absence of representative simulation environments combining LLM-Agents with reinforcement learning (RL) for rigorous evaluation. The cloud autoscaling problem is a compelling use case for benchmarking AI systems. It allows the use of a causal system model while requiring agents to solve a constrained optimisation problem: minimising resource costs while meeting strict service level objectives (SLOs), with minimal intervention and interpretable actions. We use these characteristics to develop a microservice simulation environment that models the causal relations between CPU usage, memory usage, resource limits, and latency in applications of any scale and topology. It also has the ability to introduce realistic system failures. Our simulation engine gives agents the 'licence to scale' without doing any harm in real deployments. Furthermore, it provides a realistic and controlled environment for RL agents, making it compatible with standard RL baselines. Our work provides a benchmark environment for the integration of LLMs, agents, causal models, and RL for adaptive decision-making in dynamic, resource-constrained environments.

Archival Option: The authors of this submission do *not* want it to appear in the archival proceedings.

Submission Number: 12

Loading