# Iterative Improvement Orchestrator

Configuration-based workflow orchestration framework implementing the **Iterative Task Improvement Pattern**.

**Foundation:** This orchestrator builds upon the LLM Agent Framework for structured LLM agent definition and data-driven agent improvement. All LLM agents used in this workflow (Task, Analyzer, and Improver) are implemented according to that framework's specifications - each representing a single LLM inference call with structured input processing, prompt templating, and output parsing. The agent improvement capabilities enable the systematic enhancement that drives this iterative pattern.

## Overview

The Iterative Improvement Orchestrator provides a reusable framework for implementing workflows that automatically improve task performance through iterative refinement. It follows a fixed four-agent orchestration pattern with customizable agents and data sources.

## The Iterative Task Improvement Pattern

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                        ITERATIVE TASK IMPROVEMENT PATTERN                  │
└─────────────────────────────────────────────────────────────────────────────┘

Data Sources:
├── dev_data (required)
├── ground_truth (optional)  
└── additional_data (flexible)

┌─────────────────────────────────────────────────────────────────────────────┐
│                              ITERATION N                                   │
│                                                                             │
│  ┌─────────┐    ┌───────────┐    ┌──────────┐    ┌──────────┐              │
│  │  TASK   │───▶│ EVALUATOR │───▶│ ANALYZER │───▶│ IMPROVER │              │
│  │ (LLM)   │    │(non-LLM)  │    │  (LLM)   │    │  (LLM)   │              │
│  └─────────┘    └───────────┘    └──────────┘    └──────────┘              │
│       │              │                │              │                     │
│       ▼              ▼                ▼              ▼                     │
│  task_results   evaluation_    analysis_     improved_task_agent           │
│                 results        results                                     │
└─────────────────────────────────────────────────────────────────────────────┘
                                                    │
                                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            ITERATION N+1                                   │
│                                                                             │
│  ┌─────────┐    ┌───────────┐    ┌──────────┐    ┌──────────┐              │
│  │IMPROVED │───▶│ EVALUATOR │───▶│ ANALYZER │───▶│ IMPROVER │              │
│  │  TASK   │    │(non-LLM)  │    │  (LLM)   │    │  (LLM)   │              │
│  │ (LLM)   │    │           │    │          │    │          │              │
│  └─────────┘    └───────────┘    └──────────┘    └──────────┘              │
└─────────────────────────────────────────────────────────────────────────────┘

Data Flow:
dev_data ──────────────┐
ground_truth ──────────┼─────▶ TASK (LLM) ──▶ task_results
additional_data ───────┘                           │
                                                   ▼
                                            EVALUATOR (non-LLM)
                                                   │
                                                   ▼ evaluation_results
                                            ANALYZER (LLM) ◀── additional_data
                                                   │           iteration_history
                                                   ▼ analysis_results
                                            IMPROVER (LLM) ──▶ improved_task_agent
                                                   ▲
                                            iteration_history
```

## Workflow Components

Every iterative improvement workflow consists of **four required agents**:

### Task Agent (LLM-based)
Performs the primary work to be improved. Implemented as a standard LLM agent with configuration file, Python class, and input schema. Receives input data and produces task results that will be evaluated and improved over iterations.

### Evaluator (User-implemented)
Evaluates task performance against ground truth or success criteria. Unlike the LLM agents, this is a custom Python class you implement with an `evaluate` method. Provides detailed feedback that drives the improvement process.

### Analyzer Agent (LLM-based)
Analyzes task results and evaluation feedback to extract patterns and improvement insights. Implemented as a standard LLM agent that processes performance data and identifies specific areas for enhancement.

### Improver Agent (LLM-based)
Generates improved versions of the task agent based on analysis results. Implemented as a standard LLM agent that modifies task agent configurations, prompts, and implementations to address identified weaknesses.

## Workflow Configuration

The workflow is driven by a JSON configuration file with the following required sections:

### Required Fields

#### `data_sources` (required)
Defines input data for the workflow:
```json
{
  "data_sources": {
    "dev_data": {
      "path": "data/problems.json",
      "description": "Development dataset for task execution"
    },
    "ground_truth": {
      "path": "data/answers.json",
      "description": "Ground truth for evaluation (optional)"
    },
    "additional_data": {
      "path": "data/context.json",
      "description": "Additional context data (optional)"
    }
  }
}
```

#### `agents` (required)
Specifies all four required agents:
```json
{
  "agents": {
    "task": {
      "type": "llm_agent",
      "config_path": "agents/task/config.json",
      "agent_path": "agents/task/agent.py",
      "input_schema_path": "agents/task/input_schema.json",
      "num_runs": 3
    },
    "evaluator": {
      "type": "computational",
      "agent_path": "evaluator.py"
    },
    "analyzer": {
      "type": "llm_agent",
      "config_path": "agents/analyzer/config.json",
      "agent_path": "agents/analyzer/agent.py",
      "input_schema_path": "agents/analyzer/input_schema.json"
    },
    "improver": {
      "type": "llm_agent",
      "config_path": "agents/improver/config.json",
      "agent_path": "agents/improver/agent.py",
      "input_schema_path": "agents/improver/input_schema.json"
    }
  }
}
```

#### `max_iterations` (required)
Maximum number of improvement iterations:
```json
{
  "max_iterations": 6
}
```

#### `convergence_threshold` (required)
Threshold for determining convergence:
```json
{
  "convergence_threshold": 0.01
}
```

#### `early_stop_condition` (required)
Condition for early stopping (Python expression):
```json
{
  "early_stop_condition": "overall_score >= 0.95"
}
```

#### `output_structure` (required)
Defines output directory:
```json
{
  "output_structure": {
    "base_dir": "improvement_outputs"
  }
}
```

#### Per-Agent Static Files (optional)
Defines static files for specific agents:
```json
{
  "agents": {
    "task": {
      "type": "llm_agent",
      "config_path": "agents/task/config.json",
      "agent_path": "agents/task/agent.py",
      "static_files": {
        "icl_examples_data": {
          "path": "data/icl_examples.json"
        }
      }
    },
    "analyzer": {
      "type": "llm_agent",
      "config_path": "agents/analyzer/config.json",
      "agent_path": "agents/analyzer/agent.py",
      "static_files": {
        "agent_framework_guide": {
          "path": "../../llm_agent/README.md"
        }
      }
    }
  }
}
```
Files are loaded once at startup and provided as input data to the specific agent that declares them.

#### Optional Fields

**`validation_retry_limit`** - Number of retries for validation failures (default: 5):
```json
{
  "validation_retry_limit": 5
}
```

**`id_field`** - Field name for problem IDs (default: "id"):
```json
{
  "id_field": "problem_id"
}
```

**`num_runs`** - Number of times to run task agent per problem (default: 1):
```json
{
  "agents": {
    "task": {
      "num_runs": 3
    }
  }
}
```
Collects multiple outputs per problem: `{"id": problem_id, "model_outputs": [run1, run2, run3]}`

### Complete Example
```json
{
  "data_sources": {
    "dev_data": {
      "path": "data/aime_problems.json",
      "description": "AIME 2024 math problems"
    },
    "ground_truth": {
      "path": "data/aime_answers.json",
      "description": "AIME 2024 correct answers"
    }
  },

  "agents": {
    "task": {
      "type": "llm_agent",
      "config_path": "agents/math_solver/config.json",
      "agent_path": "agents/math_solver/agent.py",
      "input_schema_path": "agents/math_solver/input_schema.json",
      "num_runs": 3
    },
    "evaluator": {
      "type": "computational",
      "agent_path": "math_evaluator.py"
    },
    "analyzer": {
      "type": "llm_agent",
      "config_path": "agents/performance_analyzer/config.json",
      "agent_path": "agents/performance_analyzer/agent.py",
      "input_schema_path": "agents/performance_analyzer/input_schema.json",
      "data_contract": {
        "problem_id_field": "problem_id",
        "score_field": "accuracy",
        "score_range": {"min": 0.0, "max": 1.0}
      },
      "static_files": {
        "agent_framework_guide": {
          "path": "../../llm_agent/README.md"
        },
        "iterative_improvement_guide": {
          "path": "../../iterative_improvement/README.md"
        }
      }
    },
    "improver": {
      "type": "llm_agent",
      "config_path": "agents/agent_improver/config.json",
      "agent_path": "agents/agent_improver/agent.py",
      "input_schema_path": "agents/agent_improver/input_schema.json",
      "static_files": {
        "agent_framework_guide": {
          "path": "../../llm_agent/README.md"
        },
        "iterative_improvement_guide": {
          "path": "../../iterative_improvement/README.md"
        }
      }
    }
  },
  "max_iterations": 6,
  "convergence_threshold": 0.01,
  "early_stop_condition": "overall_score >= 1.0",
  "output_structure": {
    "base_dir": "improvement_outputs"
  },

}
```

## How the Workflow Orchestrator Works

The WorkflowOrchestrator handles several key operations beyond what's visible in the configuration:

### Automatic Experiment Management
```
WorkflowOrchestrator(config) →
  Creates: improvement_outputs/{dataset}_runs{N}_iter{M}_{timestamp}/
  Saves: experiment_metadata.json, complete_history.json, inference_logs/
```

### Multi-Run Task Execution
```
For each problem in dev_data:
  Run task agent N times (num_runs from config)
  Collect all outputs: {"id": problem_id, "model_outputs": [run1, run2, ...]}
```

### Intelligent Agent Path Resolution
```
Iteration 0: Use original agents from config
Iteration N: Use improved_agents/iteration_{N:03d}_task/ (if exists)
Fallback: Use most recent working iteration or original
```

### Validation Retry Logic
```
if improver generates invalid agent:
  retry up to validation_retry_limit times
  send errors back to improver for fixes
  fallback to previous working iteration if all retries fail
```

### Best Agent Tracking
```
Track highest scoring agent across all iterations
Provide best agent data to improver for reference
Save best agents to best_agents/ directory
```

## Quick Start

```python
from llm_framework.iterative_improvement.core import WorkflowOrchestrator

# Create and run workflow
orchestrator = WorkflowOrchestrator("workflow_config.json")
results = orchestrator.run_workflow()
```

See the complete configuration example above for the required `workflow_config.json` structure.

## Implementation Steps

### 1. Create Project Structure
```
your_project/
├── workflow_config.json          # Your workflow configuration
├── agents/
│   ├── task/
│   │   ├── task_config.json
│   │   ├── task_agent.py
│   │   └── task_input_schema.json
│   ├── analyzer/
│   │   ├── analyzer_config.json
│   │   ├── analyzer_agent.py
│   │   └── analyzer_input_schema.json
│   └── improver/
│       ├── improver_config.json
│       ├── improver_agent.py
│       └── improver_input_schema.json
├── evaluator.py                  # Your evaluator implementation
└── data/
    ├── dev_data.json
    └── ground_truth.json
```

### 2. Implement Required Components

#### Evaluator (Computational)
```python
class YourEvaluator:
    def evaluate(self, task_results, ground_truth):
        # Your evaluation logic
        return {
            "overall_score": 0.85,
            "individual_results": [
                {
                    "problem_id": "prob1",
                    "accuracy": 1.0,  # Score field specified in data_contract
                    # ... other evaluation fields
                }
            ]
        }
```

#### Analyzer Data Contract
The analyzer requires a `data_contract` in the workflow config that specifies how to interpret evaluator output:
```json
{
  "agents": {
    "analyzer": {
      "type": "llm_agent",
      "config_path": "agents/analyzer/config.json",
      "agent_path": "agents/analyzer/agent.py",
      "input_schema_path": "agents/analyzer/input_schema.json",
      "data_contract": {
        "problem_id_field": "problem_id",
        "score_field": "accuracy", 
        "score_range": {"min": 0.0, "max": 1.0}
      },
      "analyzer_config": {
        "num_correct_examples": 3,
        "num_incorrect_examples": 5
      }
    }
  }
}
```

**Required data_contract fields:**
- `problem_id_field`: Field name for problem identifier in evaluation results
- `score_field`: Field name for numeric score (max value = correct, < max = incorrect)  
- `score_range`: Min/max values for the score field

**How it works:**
1. Workflow orchestrator reads `data_contract` from config
2. Passes it to analyzer agent's config at runtime
3. Analyzer uses field names to parse evaluation results
4. No redundant schema definitions needed

#### LLM Agents (Task, Analyzer, Improver)
Implement using the LLM agent framework. See `examples/aime2024_improvement/` for complete implementations.

### 3. Run Workflow
```python
from llm_framework.iterative_improvement.core import WorkflowOrchestrator

orchestrator = WorkflowOrchestrator("workflow_config.json")
results = orchestrator.run_workflow()
```ons

## Error Handling

The framework includes automatic error recovery:
- **Config validation errors**: Invalid configurations trigger improver fixes
- **JSON/Python syntax errors**: Malformed code sent back to improver for correction
- **Retry limit**: Configurable retries (default: 5) prevent infinite loops
- **Fallback mechanism**: Uses previous working iteration if all retries failhensive descriptions for all components

### Error Handling
- Validate configurations before running workflows
- Implement proper error handling in custom evaluators
- Use the provided validation tools to catch issues early



