Formal Evaluation of Multi-Robot Coordination: A Testing Protocol with Temporal Logic and Risk Metrics

Published: 11 Oct 2025, Last Modified: 15 Oct 2025IROS 2025 LEAPRIDE PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent Reinforcement Learning, Robustness and Fragility Analysis, Signal Temporal Logic, Risk-Aware Evaluation, Coordination and Role Specialization, Benchmarking Cooperative Agents
Abstract: Robust and adaptable behavior is critical for multi-agent reinforcement learning (MARL) systems deployed in dynamic and unpredictable environments. However, common evaluation practices, such as reporting the mean episodic reward, often fail to reveal coordination fragilities that can undermine reliability in practice. This paper introduces a lightweight evaluation framework that combines Signal Temporal Logic (STL) monitoring with Conditional Value-at-Risk (CVaR) analysis to expose coordination pathologies in MARL policies. Using the Steakhouse cooking simulator, we specify interpretable temporal properties of collaboration—such as fairness, activeness, and conflict resolution—and complement them with risk-sensitive performance metrics. Our experiments show that policies with similar average returns can diverge significantly in terms of robustness, role allocation, and tail-risk fragility. By bridging formal specification with empirical MARL evaluation, our framework contributes to ongoing efforts in benchmarking robustness, interpretability, and safety in multi-agent systems.
Submission Number: 18
Loading