﻿# Deep Research Self-Evolving Agents

A sophisticated multi-agent system for automated deep research using specialized, self-evolving agentic blocks. The system orchestrates dynamic workflows combining search, browsing, and reasoning agents to tackle complex research questions.

## 🚀 System Features

### Core Capabilities
- **Dynamic Workflow Orchestration**: Automatically designs and executes multi-step research workflows
- **Specialized Agent Blocks**: Modular components including searchers, browsers, summarizers, and thinkers
- **Multi-Model Support**: Compatible with OpenAI GPT models and QwQ reasoning models
- **Memory Management**: Optional persistent memory for improved performance across sessions
- **Benchmark Evaluation**: Built-in support for GAIA and BrowseComp benchmark datasets

### Agent Types
- **Searchers**: Generate queries and conduct web searches (`searcher`, `fast_searcher`, `advanced_searcher`)
- **Browsers**: Extract and analyze web content (`browser`, `fast_browser`, `advanced_browser`, `deep_browser`)
- **Summarizers**: Synthesize information from multiple sources (`summarizer`, `advanced_summarizer`)
- **Thinkers**: Reason about existing information without new searches (`thinker`)

### Workflow Patterns
1. **Thinker-Summarizer**: For questions answerable with existing information
2. **Searcher-Browser-Summarizer**: For questions requiring new web research

## 🛠️ Environment Setup

### 1. Install Dependencies
```bash
# Install the package and dependencies
pip install -e .

# For development
pip install -e .[dev]
```

### 2. Configure API Keys
Create a `.env` file in the project root with the following keys:

```bash
# Required for OpenAI models
OPENAI_API_KEY=your_openai_api_key_here

# Required for OpenRouter (alternative to OpenAI)
OPENROUTER_API_KEY=your_openrouter_api_key_here

# Required for QwQ models
DASHSCOPE_API_KEY=your_dashscope_api_key_here

# Required for web search functionality
SERP_API_KEY=your_serpapi_key_here

# Required for web content extraction
JINA_API_KEY=your_jina_api_key_here
```

### 3. API Key Setup Guide

#### OpenAI API Key
1. Visit [OpenAI API Platform](https://platform.openai.com/api-keys)
2. Create a new API key
3. Add to `.env` file

#### OpenRouter API Key (Alternative to OpenAI)
1. Visit [OpenRouter](https://openrouter.ai/)
2. Sign up and create an API key
3. Add to `.env` file
4. Provides access to multiple models through a single API

#### DashScope API Key (for QwQ models)
1. Visit [Alibaba Cloud DashScope](https://dashscope.console.aliyun.com/)
2. Create an account and get your API key
3. Add to `.env` file

#### SerpAPI Key
1. Visit [SerpAPI](https://serpapi.com/)
2. Sign up and get your API key
3. Add to `.env` file

#### Jina API Key
1. Visit [Jina AI](https://jina.ai/)
2. Create an account and get your API key
3. Add to `.env` file

## 🎯 Running Single Research Tasks

### Static Workflow (Pre-defined Pipeline)
```bash
# Using OpenAI models
python -m agents.openai.test "What are the latest developments in quantum computing?"

# Using QwQ reasoning models
python -m agents.qwq.test "What are the latest developments in quantum computing?"
```

### Dynamic Workflow (Orchestrated Pipeline)
```bash
# Using OpenAI models
python -m workflow.openai.test "What are the environmental impacts of cryptocurrency mining?"

# Using QwQ reasoning models
python -m workflow.qwq.test "What are the environmental impacts of cryptocurrency mining?"
```

## 📊 Running Benchmark Evaluations

### GAIA Benchmark

The GAIA benchmark tests general AI assistant capabilities with real-world questions.

#### Without Memory
```bash
# OpenAI models
python -m run.gaia.run_openai_gaia_without_memory --level 1 --max_test 10

# QwQ models  
python -m run.gaia.run_qwq_gaia_without_memory --level 1 --max_test 10
```

#### With Memory (Enhanced Performance)
```bash
# OpenAI models
python -m run.gaia.run_openai_gaia_with_memory --level 1 --max_test 10

# QwQ models
python -m run.gaia.run_qwq_gaia_with_memory --level 1 --max_test 10
```

#### GAIA Parameters
- `--level`: Difficulty level (1, 2, or 3)
- `--split`: Dataset split ("validation" or "test")
- `--max_test`: Maximum number of questions to evaluate

### BrowseComp Benchmark

The BrowseComp benchmark evaluates web browsing and information extraction capabilities.

#### Without Memory
```bash
# OpenAI models
python -m run.browsecomp.run_openai_browsecomp_without_memory --max_test 10

# QwQ models
python -m run.browsecomp.run_qwq_browsecomp_without_memory --max_test 10
```

#### With Memory (Enhanced Performance)
```bash
# OpenAI models
python -m run.browsecomp.run_openai_browsecomp_with_memory --max_test 10

# QwQ models
python -m run.browsecomp.run_qwq_browsecomp_with_memory --max_test 10
```

#### BrowseComp Parameters
- `--max_test`: Maximum number of questions to evaluate
- `--topic`: Filter by specific topic (optional)

## 📈 Evaluation and Results

### Benchmark Execution vs Evaluation

The benchmark process involves two steps:

1. **Execution**: Run benchmark scripts to generate answers
2. **Evaluation**: Use evaluator modules to score the results

### Step 1: Generate Results
The benchmark scripts generate JSON files with predictions:
- GAIA results: `[model]_gaia_results_level[X]_[memory_status].json`
- BrowseComp results: `[model]_browsecomp_results_[topic]_[memory_status].json`

### Step 2: Evaluate Results
Use the evaluation modules to score your results:

#### GAIA Evaluation
```bash
python -m evaluation.gaia_evaluator --level 1 --split validation --result_json_path openai_gaia_results_level1_with_memory.json
```

#### BrowseComp Evaluation
```bash
python -m evaluation.browsecomp_evaluator --topic All --result_json_path openai_browsecomp_results_All_with_memory.json
```

### Evaluation Requirements
- Requires `OPENROUTER_API_KEY` for the evaluation LLM
- Uses DeepSeek model for automatic scoring
- Compares predicted answers against ground truth

## 🔧 Advanced Configuration

### Model Selection
Edit configuration files to customize models:
- `agents/openai/config.py` - OpenAI model settings
- `agents/qwq/config.py` - QwQ model settings
- `orchestrator/*/config.py` - Orchestrator model settings

### Memory Management
Enable memory for improved performance across sessions:
```python
from memory.memory_manager import MemoryManager
memory_manager = MemoryManager()
```

### Custom Workflows
Create custom research workflows by:
1. Defining new agent blocks in `agents/`
2. Creating workflow templates in `orchestrator/`
3. Adding evaluation metrics in `evaluation/`

## 📁 Project Structure
```
deep_research/
├── agents/           # Agent building blocks
│   ├── openai/      # OpenAI-based agents
│   └── qwq/         # QwQ-based agents
├── orchestrator/    # Workflow orchestration
├── workflow/        # Workflow managers
├── evaluation/      # Benchmark evaluators
├── datasets/        # Dataset loaders (GAIA, BrowseComp)
├── memory/          # Memory management
├── run/             # Benchmark execution scripts
└── README.md
```

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🆘 Troubleshooting

### Common Issues

**API Key Errors**: Ensure all required API keys are set in `.env` file

**Import Errors**: Install dependencies with `pip install -e .`

**Memory Issues**: Reduce `--max_test` parameter for large evaluations

**Network Errors**: Check internet connection and API service status

### Support
For issues and questions, please open a GitHub issue with:
- Error message and stack trace
- System information (OS, Python version)
- Steps to reproduce the problem