# Rewards Package

**A modular reward system for evaluating AI agent performance across multiple domains.**

## Overview

The Rewards package provides a flexible framework for evaluating AI agent performance in mathematics, code execution, tool usage, and power system analysis.

### Key Features
- 🧩 **Modular Architecture**: Pluggable components for different evaluation types
- 🔧 **Multi-Domain Support**: Mathematics, tools, and power systems
- 📊 **Comprehensive Metrics**: Detailed scoring with component breakdown
- 🔄 **Backward Compatible**: Legacy API support with migration warnings

## Quick Start

```python
from rewards import PowerSystemReward, ToolReward, MathReward

# Power system evaluation
ps_reward = PowerSystemReward(
    weights={
        'load_satisfaction': 1.0,
        'connectivity': 0.5,
        'validation': 0.3
    }
)
scores = ps_reward.evaluate(completions, init_code)

# Tool-based evaluation
tool_reward = ToolReward(
    tools=[calculator_tool, plotter_tool],
    weights={'execution': 0.8, 'format': 0.2}
)

# Math problem evaluation
math_reward = MathReward(
    grading_mode='symbolic',
    timeout=10.0
)
```

## Main Components

### PowerSystemReward
Evaluates power system designs across multiple dimensions:
- **Connectivity**: Generator-to-load network paths
- **Validation**: Basic graph validation
- **Parameters**: Block parameter correctness
- **Conversion**: Pandapower format compatibility
- **Diagnostics**: Electrical validity through power flow
- **Load Satisfaction**: Load serving capability
- **Structure**: Network architecture quality

### ToolReward
Evaluates tool usage and execution:
- Tool execution success
- Output formatting
- XML structure compliance

### MathReward
Evaluates mathematical problem solving:
- Symbolic vs numerical grading
- Answer normalization
- Expression validation

### Block Effectiveness
Evaluates individual block connectivity:
- Type-specific connection requirements
- Three-Phase V-I Measurement blocks
- Custom block evaluators

## Configuration

```python
# Custom weights for different training phases
research_weights = {
    'load_satisfaction': 1.0,
    'connectivity': 0.8,
    'validation': 0.5
}

# Progressive training
phase1_weights = {'structure': 0.4, 'validation': 0.3, 'connectivity': 0.3}
phase2_weights = {'connectivity': 0.5, 'structure': 0.2, 'conversion': 0.2}
phase3_weights = {'diagnostic': 0.35, 'conversion': 0.25, 'load_satisfaction': 0.2}

evaluator = PowerSystemReward(weights=research_weights)
```

## Migration from Legacy

```python
# OLD (deprecated but works)
from rewards.power_system_reward import PowerSystemReward
from rewards.complete_reward import CompleteReward

# NEW (recommended)
from rewards import PowerSystemReward, ToolReward

# Migration helper
from rewards.legacy_compatibility import print_migration_guide
print_migration_guide()
```

## Architecture

```
rewards/
├── core/                   # Base classes and interfaces
├── components/            # Individual reward components
├── implementations/       # Main reward classes
├── math/                 # Mathematical evaluation
└── utils/               # Helper functions
```
