SpecMAS: A Multi-Agent System for Self-Verifying System Generation via Formal Model Checking

Rishabh Agrawal; Kaushik Tushar Ranade; Aja Khanal; Kalyan Shankar Basu; Apurva Narayan

SpecMAS: A Multi-Agent System for Self-Verifying System Generation via Formal Model Checking

Rishabh Agrawal, Kaushik Tushar Ranade, Aja Khanal, Kalyan Shankar Basu, Apurva Narayan

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-agent systems, formal verification, temporal logic, specification generation, model-checking, Large Language Models

TL;DR: SpecMAS converts natural-language SOPs into NuSMV models and autonomously self-verifies and debugs them via multi-agent temporal-logic checking, ensuring verifiable system designs.

Abstract: We present SpecMAS, a novel multi-agent system that autonomously constructs and formally verifies executable system models from natural language specifications. Given a Standard Operating Procedure (SOP) describing a target system, SpecMAS parses the specification, identifies relevant operational modes, variables, transitions, and properties, and generates a formal model in NuSMV code syntax, an industry-standard symbolic model checker. A dedicated reasoning agent extracts both explicit and implicit properties from the SOP, and verification is performed via temporal logic model checking. If any properties fail to verify, an autonomous debugging agent analyzes counterexamples and iteratively corrects the model until all properties are satisfied. This closed-loop system design guarantees provable correctness by construction and advances the state of the art in automated, interpretable, and deployable verification pipelines. We demonstrate the generality, correctness, and practical feasibility of SpecMAS across a set of representative case studies and propose a new benchmark dataset for the evaluation and comparison of model checking performance.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 24811

Loading