# EvoMAS

<div align="center">
<h3>🧬 Heuristics in the Loop—Evolving Smarter Agentic Workflows</h3>
<img src="images/cover.jpg" alt="EvoMAS Cover" width="600">
</div>

## 🚀 Project Introduction

EvoMAS (Evolutionary Multi-Agent System) is an advanced multi-agent system framework based on evolutionary algorithms, designed for solving complex tasks. The system coordinates multiple specialized agents (such as Planner, Actor, and Verifier) to automatically plan, execute, and verify solutions. The framework supports evolutionary optimization, enabling automatic generation and optimization of agent workflows based on task requirements, making it particularly suitable for mathematical reasoning, problem-solving, and other complex tasks requiring multi-step reasoning.

## ✨ Core Features

- **🤝 Multi-Agent Collaboration**: Complete complex tasks through collaboration of specialized agents, each responsible for specific roles
- **🔄 Automatic Flow Generation**: Automatically generate appropriate agent workflows and code implementations based on task descriptions
- **🧬 Evolutionary Optimization**: Use evolutionary strategies to optimize agent workflows, improving task-solving efficiency and accuracy
- **📚 Curriculum-Guided Evolution**: Progressive learning from simple to complex tasks with adaptive difficulty adjustment
- **🤖 LLM-as-a-Judge**: Automatic difficulty classification for datasets using large language models
- **✅ Result Verification and Feedback**: Automatically verify solutions and provide feedback, forming a closed-loop optimization
- **🛠️ Tool Integration**: Agents can call external tools and APIs to perform specific functions, supporting MCP protocol
- **📊 Benchmark Support**: Built-in support for multiple benchmark datasets including GSM8K, MATH, HumanEval, MBPP, GAIA, etc.
- **⚡ Asynchronous Execution**: Support for asynchronous multi-agent systems to improve execution efficiency

## 🧠 Core Components

### Agent System
- **🗺️ Planner**: Responsible for analyzing tasks and generating execution plans
- **🎭 Actor**: Responsible for executing specific actions according to the plan
- **🔍 Verifier**: Responsible for validating the correctness of results
- **🔧 ActorMCP**: Execution agent supporting tool calls, capable of interacting with external tools

### Evolutionary Optimization Module
- **🔄 FlowGen**: Agent flow generator that automatically creates multi-agent systems based on task descriptions
- **🧬 EvoOptimizer**: Evolutionary optimizer using evolutionary algorithms to optimize agent workflows
- **🧬 FlowGenome**: Flow genome representing evolvable agent workflows

### Tool Integration
- **🛠️ UnifiedTool**: Unified tool interface supporting both function and MCP server tools
- **📡 MCP Servers**: External tool servers supporting Model Control Protocol
- **⚙️ ToolRegistry**: Tool registry for managing all available tools

## 📁 Directory Structure

```
EvoMAS/
├── agent/                    # Core agent implementations
│   ├── Agent.py             # Synchronous agent base classes and implementation
│   └── Agent_async.py       # Asynchronous agent base classes and implementation
├── tools/                   # Tool integration module
│   ├── base.py             # MCP server base classes
│   ├── unified_tool.py     # Unified tool interface
│   └── __init__.py
├── mcp_servers/            # MCP server configurations
├── evo/ 
    ├── curriculum_guided_evolution.py    # Curriculum-guided evolution implementation
    ├── curriculum_example.py             # Curriculum evolution example script
├── Benchmark/              # Benchmark datasets
│   ├── gsm8k-test.jsonl   # GSM8K math word problems
│   ├── mbpp.jsonl         # MBPP code generation benchmark
│   ├── MATH/              # MATH high school and college level problems
│   ├── humaneval/         # HumanEval code generation evaluation
│   └── gaia/              # GAIA comprehensive AI evaluation
├── Evaluation/            # Evaluation tools
│   ├── Eval_GSM8k.py     # GSM8K evaluation script
│   ├── Eval_MATH.py      # MATH evaluation script
│   └── ...
├── results/               # Results output directory
├── logs/                  # Log files
├── utils/                 # Utility functions
├── FlowGen.py                        # Agent flow generator
├── evo_optimizer.py                  # Evolutionary optimizer
├── flow.py                           # Main multi-agent system flow
├── main.py                           # Project entry example
├── Agent_block.py                    # Agent building blocks
├── flow_genome_tmp.py                # Temporary generated flow genome
├── requirements.txt                  # Project dependencies
└── README.md                         # Project documentation
```

## 📦 Installation & Usage

### Requirements

- Python >= 3.8
- Supported LLM services: OpenAI API, other OpenAI-compatible services
- MCP: Model Control Protocol support
- Recommended to use uv as package manager for faster installation

### Installation Steps

1. **Clone the repository**
   ```bash
   git clone https://anonymous.4open.science/r/EvoMAS-DEF4
   cd EvoMAS
   ```

2. **Install dependencies**
   ```bash
   # Using pip
   pip install -r requirements.txt
   
   # Or using uv (recommended for speed)
   pip install uv  # Install uv first
   uv pip install -r requirements.txt
   ```

3. **Configure environment variables**
   ```bash
   # Create .env file and set API key
   echo "OPENAI_API_KEY=your_api_key_here" > .env
   ```

### Basic Usage

1. **Run example**
   ```bash
   python main.py
   ```

2. **Custom agent flow**
   ```python
   from FlowGen import FlowGen
   from tools.unified_tool import ToolRegistry
   
   # Create tool registry
   tool_registry = ToolRegistry()
   
   # Generate agent flow
   flow = FlowGen("Solve middle school math problems", tools=None).generation()
   
   # Save generated code
   with open("flow_genome_tmp.py", "w", encoding="utf-8") as f:
       f.write(flow["code"])
   
   # Use the generated multi-agent system
   from flow_genome_tmp import MultiAgentSystem
   system = MultiAgentSystem("Test", tool_registry.get_all_tools())
   result = await system.run("Calculate 999+1111")
   print(result["answer"])
   ```

3. **Using evolutionary optimization**
   ```python
   from evo_optimizer import EvoOptimizer
   
   # Initialize evolutionary optimizer
   optimizer = EvoOptimizer(
       "Solve complex middle school math problems",
       tools=None,
       population_size=2,
       llm="gpt-4o-mini"
   )
   
   # Run evolutionary optimization
   best_individual = optimizer.evolve(generations=5, benchmark_type="gsm-8k")
   
   # Evaluate best individual
   optimizer.evaluate(best_individual, mode="test")
   ```

4. **Using curriculum-guided evolution**
   ```python
   from curriculum_guided_evolution import CurriculumGuidedEvoOptimizer
   
   # Initialize curriculum-guided evolutionary optimizer
   optimizer = CurriculumGuidedEvoOptimizer(
       "Solve mathematical reasoning problems step by step",
       population_size=3,
       enable_curriculum=True,
       llm="gpt-4o-mini"
   )
   
   # Run curriculum-guided evolution
   best_individual, curriculum_history = optimizer.evolve_with_curriculum(
       generations=10, 
       benchmark_type="gsm-8k"
   )
   
   # Analyze curriculum progress
   optimizer.analyze_curriculum_progress()
   ```

5. **Running curriculum evolution example**
   ```bash
   # Basic curriculum-guided evolution
   python curriculum_example.py --benchmark gsm-8k --generations 10
   
   # Compare curriculum vs traditional evolution
   python curriculum_example.py
   # Then select option 2 for comparison
   ```

6. **Using LLM-as-a-judge for difficulty classification (NEW!)**
   ```python
   from curriculum_guided_evolution import DifficultyClassifier, CurriculumManager
   
   # Initialize classifier
   classifier = DifficultyClassifier(llm="gpt-4o-mini")
   
   # Classify a dataset
   results = classifier.classify_dataset(
       dataset_path="path/to/your/dataset.jsonl",
       domain="gsm8k",  # or "math", "general"
       sample_size=50,
       output_path="results/classification.json"
   )
   
   # Create curriculum configuration from classification
   curriculum_config = classifier.create_curriculum_config_from_classification(
       results, "my_dataset"
   )
   
   # Use with curriculum manager
   manager = CurriculumManager()
   manager.add_llm_classified_dataset(
       dataset_path="path/to/dataset.jsonl",
       dataset_name="my_dataset",
       domain="gsm8k"
   )
   ```

7. **Running LLM-based classification example**
   ```bash
   # Interactive classification tool
   python difficulty_classification_example.py
   
   # Command line classification
   python difficulty_classification_example.py --dataset_path Benchmark/gsm8k-test.jsonl --domain gsm8k --sample_size 20
   
   # Full pipeline: classification + curriculum evolution
   python curriculum_example.py
   # Then select option 4
   ```

8. **Tool integration example**
   ```python
   import asyncio
   from tools.unified_tool import ToolRegistry
   from tools.base import Configuration, Server
   
   async def setup_tools():
       tool_registry = ToolRegistry()
       
       # Register function tools
       def calculator(operation: str, a: float, b: float) -> float:
           if operation == "add":
               return a + b
           # ... other operations
       
       tool_registry.register_function(calculator)
       
       # Load MCP servers
       config = Configuration()
       server_config = config.load_config("mcp_servers/servers_config.json")
       
       servers = [
           Server(name, srv_config)
           for name, srv_config in server_config["mcpServers"].items()
       ]
       
       # Initialize and register server tools
       for server in servers:
           await server.initialize()
           await tool_registry.register_server_tools(server)
       
       return tool_registry, servers
   ```

## 📊 Benchmarks

EvoMAS supports multiple benchmark datasets to evaluate system performance:

- **GSM8K**: Collection of elementary and middle school math word problems
- **MATH**: Collection of high school and college entrance level math problems
- **HumanEval**: Dataset for evaluating code generation capabilities
- **MBPP**: Python code generation benchmark
- **GAIA**: Comprehensive AI assistant evaluation benchmark

Run benchmark testing example:
```bash
python Evaluation/Eval_GSM8k.py
```



## 🔧 Advanced Configuration

### LLM Service Configuration

Configure LLM service provider and API key before use:

```python
# .env file configuration example
OPENAI_API_KEY=your_openai_api_key
```

### MCP Tool Integration

Configure external tool servers through `mcp_servers/servers_config.json`:

```json
{
  "mcpServers": {
    "server_name": {
      "command": "server_command",
      "args": ["arg1", "arg2"],
      "env": {}
    }
  }
}
```

### Evolution Strategy Configuration

Support multiple evolution strategy types (1-5), can be specified during evolution:

```python
# Different evolution strategies
new_population = flowevo.evolve(strategy_type="1", parents=pops, llm="gpt-4o-mini")
```

## 🎯 Use Cases

- **Mathematical Reasoning**: Automatically solve various math problems, from basic arithmetic to advanced mathematics
- **Code Generation**: Generate and optimize code solutions
- **Complex Problem Solving**: Complex tasks requiring multi-step reasoning and verification
- **Tool-calling Tasks**: Tasks requiring interaction with external tools and APIs
- **Benchmark Testing**: Evaluate AI system performance on standardized datasets

## 💡 Contributing

We welcome contributions of all forms, including but not limited to:

- Submitting issues and feature requests
- Improving documentation
- Fixing bugs and adding new features
- Adding new agent types and optimization strategies
- Contributing new benchmark datasets

Feel free to submit issues or pull requests to participate in project development.

## 📄 License

MIT License

---

<div align="center">
<p>🌟 If this project helps you, please give us a Star! 🌟</p>
</div>
