# Paper Title

This repository is the official implementation of SpecMAS: A Multi-Agent System for Self-Verifying System Generation via Formal Model Checking

![alt text](image.png)

## Code Files
Following is a one liner description of the code files used in to build our system.

- agent_templates - Prompt templates for the General Purpose Agentic Graph Node.
- api_server -  This code implements a FastAPI-based AI agent system with HuggingFace and LangChain integrations, featuring:

LLM-powered reasoning & code execution: Uses HuggingFace models (via Llama.cpp API) for multi-step reasoning and code generation
Advanced workflows: Supports ReWOO and Plan-and-Execute strategies for complex problem-solving
RAG/CRAG capabilities: Implements Retrieval-Augmented Generation with document search and corrective multi-hop reasoning
Model checking integration: Includes NuSMV planner and tools for Kripke structure extraction and property verification
Streaming interface: Provides real-time SSE streaming for agent thought processes and results
Conversation memory: Maintains chat history with buffer memory and thread management
The system combines LLM inference, code execution, knowledge retrieval, and formal verification in a unified API for interactive AI-powered workflows.

- crag_agent - Information Retriever Agent - Implements a CRAG workflow using Arxiv/Local PDFs with Chroma vectorstore and Tavily web search. Features:
Hybrid RAG: Combines Arxiv papers & local PDFs (using PyPDFLoader)
Relevance grading: Filters documents with LLM-based binary relevance scoring
Query rewriting: Optimizes questions for better retrieval
Graph-based pipeline: StateGraph workflow with conditional edges for document grading and web search
HuggingFace embeddings: Uses gte-large-en-v1.5 for document vectorization
Automatically switches between knowledge retrieval, web search, and answer generation for comprehensive question-answering

- knowledge_search_templates - Prompt Templates for the Knowledge Search tool.
- Exp_Utils -  Processes AI model evaluations for system specification tasks, computes weighted scores, and generates performance visualizations. Features:

Score aggregation: Calculates composite metrics (structural alignment, property/semantic fidelity)
Count normalization: Normalizes error counts across models (counterexamples, minor issues)
Compliance merging: Combines quality and execution scores into final weighted metrics
Visual analytics:
Debugging steps comparison bar charts
Dual-axis spec-quality vs compliance slope graphs
Radar charts for domain-specific model performance
Domain categorization: Evaluates models across 6 technical domains (protocols, concurrency, memory mgmt, etc.)
Exports CSV tables and PNG visualizations for detailed model comparison and compliance analysis.

- Extraction_templates - Prompt Templates for the Kripke, Properties and NuSMV Agent.
- mrag_templates - Prompt Templates for Multi-Hop RAG. Use in Knowledge Search Tool.
- node_definitions - Implements NuSMV/Kripke/Property verification agents with LangChain-based orchestration and LLM-powered planning. Features:

Modular planners: Kripke structure extraction, property specification, and NuSMV execution pipelines
StateGraph orchestration: Conditional execution of planning/steps with ReWOO and Plan-and-Execute strategies
Tool integration: Executes NuSMV, reads/writes to filesystem, and uses RAG (Chroma + HuggingFace embeddings)
Comprehensive verification: Tracks structural alignment, semantic fidelity, and execution compliance
Visualization tools: Generates bar charts, slope graphs, and radar plots for model performance comparison
Manages formal verification workflows by coordinating LLM planning, tool execution, and compliance scoring with support for SOP-based specification extraction and model validation.
- tool_templates - Prompt templates for tools used by the Kripke, Properties and NuSMV Agent.
- verification_templates - Includes templates for general question answering and domain-specific checks with configurable response formats and evaluation criteria.

## Results
Result are described in the paper and supplementatary section document.

## Benchmark
All the benchmark files including the Standard Operations Procedure documents, Expert written SMV files are in the Mini SpecBench file.