# Deep Research Integration

This document describes the integration of the Open Deep Research (ODR) workflow into the GSM Agent evaluation system.

## Overview

The Deep Research integration brings a multi-agent research workflow to your evaluation system, enabling more sophisticated problem-solving approaches for mathematical problems. Instead of a single ReAct agent, the deep research workflow uses:

1. **Supervisor Agent**: Plans and coordinates research tasks
2. **Multiple Researcher Agents**: Conduct focused research on specific aspects  
3. **Research Compression**: Synthesizes findings into structured summaries
4. **Final Report Generation**: Creates comprehensive solution reports

## Architecture

```
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   User Query    │───▶│   Clarification  │───▶│ Research Brief  │
└─────────────────┘    │   (Optional)     │    │   Generation    │
                       └──────────────────┘    └─────────────────┘
                                                         │
                       ┌─────────────────┐              ▼
                       │ Final Report    │    ┌──────────────────┐
                       │   Generation    │◀───│    Supervisor    │
                       └─────────────────┘    │    Workflow      │
                                              └──────────────────┘
                                                         │
                                              ┌──────────────────┐
                                              │   Researcher     │
                                              │   Subgraphs      │
                                              │  (Parallel)      │
                                              └──────────────────┘
                                                         │
                                              ┌──────────────────┐
                                              │    Search &      │
                                              │   Compression    │
                                              └──────────────────┘
```

## Key Components

### 1. Agent Types
- **AgentType.REACT**: Traditional ReAct agent (existing)
- **AgentType.DEEP_RESEARCH**: New multi-agent research workflow

### 2. Configuration
- **DeepResearchConfig**: Configuration for research parameters
- **Integration with existing model configs**: Seamless configuration management

### 3. Workflow Components
- **DeepResearchWorkflow**: Main workflow orchestrator
- **Supervisor/Researcher subgraphs**: Modular workflow components
- **Search integration**: Uses your existing ChromaDB search engine

## Usage

### 1. Configuration Setup

Create a configuration file with `agent_type: "deep_research"`:

```yaml
# Example: configs/deep_research_evaluation.yaml
shared_parameters:
  agent_type: "deep_research"  # Enable deep research workflow
  model_name: "gpt-4o"
  temperature: 0.3
  
  # Deep research specific settings
  deep_research_config:
    max_concurrent_research_units: 2
    max_researcher_iterations: 3
    max_react_tool_calls: 6
    allow_clarification: false
```

### 2. Running Evaluations

Use the standard evaluation command:

```bash
# Run deep research evaluation
python -m src.main evaluate --config configs/deep_research_evaluation.yaml

# Compare with traditional React agents
python -m src.main evaluate --config configs/deep_research_evaluation.yaml
```

### 3. Configuration Parameters

#### Core Deep Research Parameters
- **max_concurrent_research_units**: Number of parallel researchers (default: 3)
- **max_researcher_iterations**: Supervisor planning iterations (default: 4)  
- **max_react_tool_calls**: Tool calls per researcher (default: 8)
- **allow_clarification**: Enable user clarification (default: false for evaluation)

#### Model Configuration
- **research_model**: Model for conducting research (default: model_name)
- **compression_model**: Model for compressing findings (default: model_name)
- **final_report_model**: Model for final report (default: model_name)

## Workflow Details

### 1. Clarification Phase (Optional)
- Analyzes user input for ambiguity
- Asks clarifying questions if needed
- Usually skipped for evaluation (`allow_clarification: false`)

### 2. Research Brief Generation
- Transforms user question into structured research brief
- Identifies key mathematical concepts to investigate
- Plans research strategy

### 3. Supervisor Coordination
- Breaks down research into focused sub-tasks
- Delegates tasks to parallel researcher agents
- Uses strategic thinking between research rounds
- Coordinates findings from multiple researchers

### 4. Individual Research
- Each researcher focuses on specific aspect
- Uses search tools to find relevant information
- Applies systematic search and analysis
- Compresses findings into structured summaries

### 5. Final Report Generation
- Synthesizes all research findings
- Creates comprehensive problem solution
- Provides step-by-step mathematical reasoning
- Ends with clear numerical answer

## Search Integration

The deep research workflow seamlessly integrates with your existing search infrastructure:

- **ChromaDB Integration**: Uses your existing search engine
- **Search Tools**: Leverages `search_information()` and `next_page()` tools
- **Ground Truth Tracking**: Maintains evaluation metrics
- **No External APIs**: No need for external search services

## Comparison with React Agents

| Aspect | React Agent | Deep Research |
|--------|-------------|---------------|
| **Architecture** | Single agent | Multi-agent coordination |
| **Research Approach** | Linear search | Parallel, focused research |
| **Planning** | Reactive | Strategic, multi-step |
| **Problem Decomposition** | Limited | Systematic breakdown |
| **Information Synthesis** | Basic | Advanced compression |
| **Resource Usage** | Lower | Higher (more thorough) |
| **Solution Quality** | Good for simple problems | Better for complex problems |

## Performance Considerations

### Resource Usage
- **Higher Token Consumption**: Multiple models and iterations
- **Increased Latency**: More comprehensive research takes time
- **Parallel Processing**: Multiple researchers run concurrently

### Optimization Tips
- **Conservative Settings**: Start with lower concurrency for evaluation
- **Iteration Limits**: Set reasonable bounds for research iterations
- **Model Selection**: Use efficient models for compression phase

### Recommended Settings

#### For Evaluation (Performance-focused)
```yaml
deep_research_config:
  max_concurrent_research_units: 2
  max_researcher_iterations: 3
  max_react_tool_calls: 6
  allow_clarification: false
```

#### For Thoroughness (Quality-focused)
```yaml
deep_research_config:
  max_concurrent_research_units: 4
  max_researcher_iterations: 5
  max_react_tool_calls: 12
  allow_clarification: false
```

## Evaluation Integration

The deep research workflow integrates seamlessly with your evaluation system:

- **Same Metrics**: Uses existing evaluation metrics
- **Compatible Output**: Produces standard evaluation results
- **Comparative Analysis**: Easy to compare with React agents
- **Full Logging**: Complete conversation logs for analysis

## Example Configuration Files

See `configs/deep_research_evaluation.yaml` for:
- Deep research with different parameter settings
- Comparison configurations with React agents  
- Multiple model configurations
- Various dataset sizes

## Troubleshooting

### Common Issues

1. **Import Errors**: Ensure all deep research modules are properly imported
2. **Search Engine Missing**: Deep research requires search_engine parameter
3. **Token Limits**: Adjust token limits for different models
4. **Concurrency Issues**: Reduce concurrent units if hitting rate limits

### Debug Settings

Enable detailed logging:
```python
import logging
logging.getLogger("eval.src.agents.deep_research_workflow").setLevel(logging.DEBUG)
```

### Performance Monitoring

Monitor key metrics:
- Research iterations per problem
- Tool calls per researcher  
- Token usage across workflow
- Success rates vs complexity

## Future Enhancements

Potential extensions:
- **MCP Tool Integration**: Support for Model Context Protocol tools
- **Web Search Integration**: Optional external search capabilities
- **Custom Research Strategies**: Problem-type specific workflows
- **Advanced Compression**: More sophisticated synthesis methods

## References

- [Open Deep Research Repository](https://github.com/langchain-ai/open_deep_research)
- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
- [Deep Research Evaluation Benchmark](https://huggingface.co/datasets/langchain-ai/deep_research_bench)
