# s5_generate_system_output

System output generation tools for DriveGuard evaluation. This tool generates system outputs from different DriveGuard components for evaluation purposes.

## Overview

This tool processes dashcam videos through various DriveGuard components to generate structured outputs that can be used for evaluation and comparison. It supports different models and provides organized output storage with comprehensive tracking.

### 🎯 **Implementation Status**
- ✅ **annotation**: VideoAnnotator component - **FULLY IMPLEMENTED**
- ✅ **scene**: Scene extraction component - **FULLY IMPLEMENTED** 
- ✅ **violation**: Traffic violation detection - **FULLY IMPLEMENTED** 
- ✅ **accident**: Accident risk assessment - **NEWLY IMPLEMENTED & TESTED** 🚀
- 🔄 **assessment**: Overall driving assessment - **COMING SOON**

**Latest Achievement**: Successfully implemented and tested the accident risk assessment component with complete multi-scene processing, Neo4j integration, and gateway model support!

## Architecture

```
s5_generate_system_output/
├── __init__.py                 # Package initialization
├── __main__.py                 # Module entry point  
├── main.py                     # CLI interface
├── config.py                   # Configuration management
├── generators/                 # Component generators
│   ├── __init__.py
│   ├── base_generator.py       # Abstract base class
│   ├── video_annotator.py      # VideoAnnotator generator
│   ├── scene_extractor.py      # SceneExtractor generator
│   └── violation_checker.py    # ViolationChecker generator
└── utils/                      # Utility modules
    ├── __init__.py
    ├── file_manager.py          # File operations
    └── model_tracker.py         # Model tracking
```

## Features

### Supported Components
- **annotation**: VideoAnnotator for dashcam video analysis
- **scene**: Scene extraction from ground truth annotations  
- **violation**: Traffic violation detection using Milvus database
- **accident**: Accident risk assessment using Neo4j database ✅ **IMPLEMENTED**
- **assessment**: Comprehensive driving safety assessment ✅ **NEWLY IMPLEMENTED** 🚀

### Model Support

**Comprehensive Model Compatibility**: Supports 23+ models including:

**OpenAI Models** (with structured output):
- GPT-4o (openai:gpt-4o)  
- GPT-4.1 (openai:gpt-4.1)
- GPT-5 (openai:gpt-5)
- O3 (openai:o3)

**Groq Models** (with structured output):  
- Llama 3.3 70B (groq:llama-3.3-70b-versatile)
- DeepSeek R1 (groq:deepseek-r1-distill-llama-70b)

**Gateway Models** (with text parsing fallback):
- Claude Opus (gateway:anthropic/claude-opus-4-1-20250805)
- Claude Sonnet (gateway:anthropic/claude-sonnet-4-20250514) 
- Gemini 2.5 Pro (gateway:google-ai-studio/gemini-2.5-pro)
- Perplexity Sonar (gateway:perplexity/sonar-pro)
- And many more via gateway routing

**Key Features**:
- **Automatic Fallback**: Gateway models use intelligent text parsing when structured output isn't supported
- **Backward Compatibility**: OpenAI/Groq models continue using reliable structured output  
- **Universal Support**: All models in `evaluation/models/text.txt` and `evaluation/models/annotation.txt` are supported
- **Multi-Scene Processing**: All text-based components (scene, violation, accident) process multiple scenes per ground truth file
- **Comprehensive Testing**: All components tested and verified with multi-model support

### Output Organization
Outputs are saved in organized directory structure:
```
data/evaluation/system_outputs/
├── annotation/
│   ├── gpt-4o-2024-11-20/      # Model-specific folders
│   │   ├── 0001_bridge_tunnel_collision_accident_Xv25oDIoZvs_0045.json
│   │   ├── 0002_pedestrian_conflicts_collision_accident_1HEAdeLaF7Y_0013.json
│   │   └── ...
│   └── gpt-4-turbo-2024-04-09/
├── scene/
├── violation/
├── accident/
└── assessment/
```

## Usage

### Basic Commands

**Generate outputs for all models (multi-model processing):**
```bash
# For annotation component (reads from evaluation/models/annotation.txt)
uv run python -m evaluation.make_dataset.s5_generate_system_output --component annotation

# For scene component (reads from evaluation/models/text.txt)
uv run python -m evaluation.make_dataset.s5_generate_system_output --component scene

# For violation component (reads from evaluation/models/text.txt)  
uv run python -m evaluation.make_dataset.s5_generate_system_output --component violation

# For accident component (reads from evaluation/models/text.txt)
uv run python -m evaluation.make_dataset.s5_generate_system_output --component accident

# For assessment component (reads from evaluation/models/text.txt)
uv run python -m evaluation.make_dataset.s5_generate_system_output --component assessment
```

This will process all models listed in the respective model files sequentially. The tool will:
- Read all models from the appropriate file (annotation.txt or text.txt)
- Process each model one by one
- Provide a summary report at the end showing successful and failed models

**Generate outputs for a single model:**
```bash
# Annotation component
uv run python -m evaluation.make_dataset.s5_generate_system_output --component annotation --model "openai:gpt-4o"

# Scene extraction component
uv run python -m evaluation.make_dataset.s5_generate_system_output --component scene --model "openai:gpt-4o"

# Violation detection component
uv run python -m evaluation.make_dataset.s5_generate_system_output --component violation --model "openai:gpt-4o"

# Accident risk assessment component
uv run python -m evaluation.make_dataset.s5_generate_system_output --component accident --model "openai:gpt-4o"

# Comprehensive driving assessment component
uv run python -m evaluation.make_dataset.s5_generate_system_output --component assessment --model "openai:gpt-4o"
```

**Generate for specific videos only:**
```bash
# Single model, specific videos
uv run python -m evaluation.make_dataset.s5_generate_system_output --component annotation --model "openai:gpt-4o" --videos "1,2,3"
uv run python -m evaluation.make_dataset.s5_generate_system_output --component scene --model "openai:gpt-4o" --videos "1,2,3"
uv run python -m evaluation.make_dataset.s5_generate_system_output --component violation --model "openai:gpt-4o" --videos "1,2,3"
uv run python -m evaluation.make_dataset.s5_generate_system_output --component accident --model "openai:gpt-4o" --videos "1,2,3"
uv run python -m evaluation.make_dataset.s5_generate_system_output --component assessment --model "openai:gpt-4o" --videos "1,2,3"

# Multi-model processing for specific videos
uv run python -m evaluation.make_dataset.s5_generate_system_output --component annotation --videos "1,2,3"
uv run python -m evaluation.make_dataset.s5_generate_system_output --component scene --videos "1,2,3"
uv run python -m evaluation.make_dataset.s5_generate_system_output --component violation --videos "1,2,3"
uv run python -m evaluation.make_dataset.s5_generate_system_output --component accident --videos "1,2,3"
uv run python -m evaluation.make_dataset.s5_generate_system_output --component assessment --videos "1,2,3"
```

**Check overall status:**
```bash
uv run python -m evaluation.make_dataset.s5_generate_system_output --status
```

**Check specific component status:**
```bash
uv run python -m evaluation.make_dataset.s5_generate_system_output --status --component annotation --model "openai:gpt-4o"
uv run python -m evaluation.make_dataset.s5_generate_system_output --status --component scene --model "openai:gpt-4o"
uv run python -m evaluation.make_dataset.s5_generate_system_output --status --component violation --model "openai:gpt-4o"
uv run python -m evaluation.make_dataset.s5_generate_system_output --status --component accident --model "openai:gpt-4o"
uv run python -m evaluation.make_dataset.s5_generate_system_output --status --component assessment --model "openai:gpt-4o"
```

### Advanced Options

**Overwrite existing outputs:**
```bash
uv run python -m evaluation.make_dataset.s5_generate_system_output --component annotation --model gpt-4o-2024-11-20 --overwrite
```

**Custom frames per second:**
```bash
uv run python -m evaluation.make_dataset.s5_generate_system_output --component annotation --model gpt-4o-2024-11-20 --fps 1
```

**Custom project root:**
```bash
uv run python -m evaluation.make_dataset.s5_generate_system_output --component annotation --model gpt-4o-2024-11-20 --project-root /path/to/project
```

## Output Format

Each generated file contains structured output with metadata:

### Annotation Component Output
```json
{
  "video_id": "0001_bridge_tunnel_collision_accident_Xv25oDIoZvs_0045",
  "video_path": "/path/to/video.mp4",
  "model_id": "gpt-4o-2024-11-20",
  "component": "annotation",
  "generated_at": "2024-01-15T10:30:00Z",
  "generation_time": 12.5,
  "content": "Generated annotation content here...",
  "metadata": {
    "fps": 2,
    "frame_count": 24,
    "model_type": "multimodal",
    "generator_version": "1.0.0"
  }
}
```

### Scene Component Output
```json
{
  "video_id": "0001_bridge_tunnel_collision_accident_Xv25oDIoZvs_0045",
  "video_path": "/path/to/video.mp4",
  "model_id": "openai:gpt-4o",
  "component": "scene",
  "generated_at": "2024-01-15T10:30:00Z",
  "generation_time": 2.1,
  "content": [
    "Scene 1: Vehicle approaches intersection at normal speed...",
    "Scene 2: Traffic light changes from green to yellow...",
    "Scene 3: Vehicle makes sudden lane change without signaling..."
  ],
  "metadata": {
    "scene_count": 3,
    "model_type": "text",
    "prompt_type": "scene_extraction",
    "source": "ground_truth_annotation",
    "generator_version": "1.0.0"
  }
}
```

### Violation Component Output
```json
{
  "video_id": "0000_cut_off_accident",
  "video_path": "/path/to/video.mp4",
  "model_id": "gateway:perplexity/sonar-pro",
  "component": "violation",
  "generated_at": "2024-01-15T10:30:00Z",
  "generation_time": 7.0,
  "content": [
    {
      "scene": "Silver sedan in the right lane cuts into ego vehicle's lane",
      "violation": "found",
      "reason": "The silver sedan changed lanes without checking its blind spots and making sure it was safe to do so, which is a traffic rule violation.",
      "processing_time": 1.6
    },
    {
      "scene": "Ego vehicle steers slightly left to avoid collision",
      "violation": "not_found", 
      "reason": "No traffic rule violation found",
      "processing_time": 1.1
    }
  ],
  "metadata": {
    "scene_count": 2,
    "violation_count": 1,
    "avg_scene_processing_time": 1.35,
    "model_type": "text",
    "prompt_type": "traffic_rule_checking",
    "source": "ground_truth_scenes",
    "generator_version": "1.0.0"
  }
}
```

### Accident Component Output ✅ **TESTED & VERIFIED**
```json
{
  "video_id": "0000_cut_off_accident",
  "video_path": "/Users/lin/Workspace/DriveGuard/data/dashcam/0000_cut_off_accident.mp4",
  "model_id": "openai:gpt-4o",
  "component": "accident",
  "generated_at": "2025-09-24T20:48:52.292046",
  "generation_time": 33.24,
  "content": [
    {
      "scene": "Silver sedan in the right lane cuts into ego vehicle's lane",
      "accident": "found",
      "consequence": "A possible accident is a collision or near-collision between the ego vehicle and the silver sedan due to unsafe merging. Historical data shows that vehicles cutting into another lane without sufficient space or signaling, especially in multi-lane roads, often lead to collisions or near-misses. The main reasons are failure to check blind spots, misjudgment of distance, and lack of proper lane discipline.",
      "processing_time": 4.59
    },
    {
      "scene": "Ego vehicle steers slightly left in its lane to avoid collision after being cut off",
      "accident": "found", 
      "consequence": "A possible accident is a collision between the ego vehicle and another vehicle that cuts into its lane. Historical data shows that when a vehicle fails to yield and abruptly merges or turns across the ego vehicle's path, and the ego vehicle does not take sufficient evasive action, a collision can occur.",
      "processing_time": 10.58
    },
    {
      "scene": "Ego vehicle makes contact with silver sedan after being cut off",
      "accident": "found",
      "consequence": "A collision between the ego vehicle and the silver sedan is a possible accident, primarily due to the silver sedan cutting off the ego vehicle. Historical data shows that unsafe merging or failure to yield frequently leads to collisions or near-collisions.",
      "processing_time": 6.15
    }
  ],
  "metadata": {
    "scene_count": 5,
    "accident_count": 5,
    "avg_scene_processing_time": 6.648,
    "model_type": "text", 
    "prompt_type": "accident_risk_assessment",
    "source": "ground_truth_scenes",
    "generator_version": "1.0.0"
  }
}
```

**Real Test Results**: Successfully processed 5 scenes in 33.24 seconds with comprehensive Neo4j-based accident risk analysis for each scene.

### Assessment Component Output ✅ **NEWLY IMPLEMENTED & TESTED**
```json
{
  "video_id": "0000_cut_off_accident",
  "video_path": "/Users/lin/Workspace/DriveGuard/data/dashcam/0000_cut_off_accident.mp4",
  "model_id": "openai:gpt-4o",
  "component": "assessment",
  "generated_at": "2025-09-24T21:21:16.840346",
  "generation_time": 8.59,
  "content": {
    "safety_score": 3,
    "overall_evaluation": "The ego vehicle was involved in an accident caused primarily by another driver's unsafe lane change. However, the ego vehicle did not adequately anticipate the increased risk posed by slowing traffic in adjacent lanes and failed to adjust speed or create an escape route, which contributed to the inability to fully mitigate the collision.",
    "strengths": [
      "Maintained lane discipline and steady speed prior to the incident",
      "Attempted an evasive maneuver (steering slightly left) to avoid the collision when cut off",
      "Came to a controlled stop after the collision, minimizing further risk"
    ],
    "weaknesses": [
      "Did not reduce speed or increase following distance when adjacent lanes were slowing",
      "Failed to recognize the lack of escape routes",
      "Reaction to the sudden lane change was insufficient to avoid contact"
    ],
    "improvement_advice": [
      "When approaching intersections where adjacent lanes are slowing, proactively reduce speed and increase following distance",
      "Continuously scan not just your lane but also adjacent lanes for developing hazards",
      "Always ensure you have an escape route or buffer space in multi-lane traffic situations",
      "Practice defensive driving by anticipating mistakes of others and preparing to respond early"
    ],
    "risk_level": "critical"
  },
  "metadata": {
    "fps": 2,
    "generator_version": "1.0.0",
    "safety_score": 3,
    "risk_level": "critical",
    "strengths_count": 3,
    "weaknesses_count": 3,
    "advice_count": 4,
    "model_type": "text",
    "prompt_type": "comprehensive_driving_assessment",
    "source": "ground_truth_violations_and_accidents"
  }
}
```

**Real Test Results**: Successfully generated comprehensive assessment in 8.59 seconds with detailed safety scoring, strengths/weaknesses analysis, and actionable improvement advice.

## Component Details

### Annotation Component
The annotation component processes dashcam videos directly using multimodal LLMs to generate frame-by-frame descriptions. It extracts video frames at the specified FPS and sends them to the model for analysis.

### Scene Extraction Component
The scene extraction component processes ground truth annotations (not videos directly) to extract atomic scenes. It:

1. **Input**: Reads ground truth annotations from `data/evaluation/ground_truth/{video_id}.json`
2. **Processing**: Uses SceneExtractor agent to break down annotations into discrete, atomic scenes
3. **Output**: Returns a list of scene descriptions that can be used for further analysis
4. **Models**: Uses text-only models from `evaluation/models/text.txt` (no multimodal capability needed)

The scene extraction is particularly useful for:
- Breaking complex driving scenarios into manageable components  
- Creating atomic units for violation and accident analysis
- Improving retrieval accuracy by providing focused scene descriptions

### Violation Detection Component
The violation detection component processes scenes from ground truth annotations to identify traffic rule violations. It:

1. **Input**: Reads ground truth annotations from `data/evaluation/ground_truth/{video_id}.json`
2. **Scene Processing**: Extracts scenes from ground truth and processes each individually
3. **Traffic Rule Retrieval**: Uses Milvus vector database to find relevant traffic rules
4. **Violation Analysis**: Applies traffic rule checker agent to determine violations
5. **Output**: Returns detailed violation results with reasoning and timing data
6. **Models**: Uses text-only models from `evaluation/models/text.txt`

**Key Features**:
- **Multi-Scene Processing**: Each ground truth file may contain multiple scenes processed individually
- **Runtime Tracking**: Detailed timing for both individual scenes and total processing  
- **Smart Retrieval**: Uses embedding-based similarity search to find relevant traffic rules
- **Gateway Model Support**: Intelligent text parsing fallback for models without structured output support
- **Comprehensive Output**: Violation status, detailed reasoning, and performance metrics

The violation component integrates with:
- **Milvus Database**: Vector similarity search for traffic rule retrieval
- **Traffic Rule Checker Agent**: LLM-based violation detection with retrieval grading
- **Multi-Model Pipeline**: Supports all models with automatic fallback handling

### Accident Risk Assessment Component ✅ **FULLY IMPLEMENTED & TESTED**
The accident risk assessment component processes scenes from ground truth annotations to identify potential accident risks based on historical data. It:

1. **Input**: Reads ground truth annotations from `data/evaluation/ground_truth/{video_id}.json`
2. **Scene Processing**: Extracts scenes from ground truth and processes each individually
3. **Historical Accident Retrieval**: Uses Neo4j graph database to find similar accident scenarios
4. **Risk Analysis**: Applies traffic accident retriever agent to assess collision risks
5. **Output**: Returns detailed accident analysis results with consequences and timing data
6. **Models**: Uses text-only models from `evaluation/models/text.txt`

**Key Features**:
- **Multi-Scene Processing**: Each ground truth file may contain multiple scenes processed individually ✅ **VERIFIED**
- **Runtime Tracking**: Detailed timing for both individual scenes and total processing ✅ **VERIFIED**
- **Neo4j Integration**: Graph-based similarity search to find relevant historical accidents ✅ **VERIFIED**
- **Gateway Model Support**: Intelligent text parsing fallback for models without structured output support ✅ **TESTED**
- **Comprehensive Output**: Accident risk status, detailed consequence analysis, and performance metrics ✅ **VERIFIED**

**Test Results**:
- ✅ Successfully processes 5 scenes per ground truth file
- ✅ Individual scene timing (4.6s - 10.6s per scene)
- ✅ Neo4j database connection and retrieval working
- ✅ Comprehensive consequence analysis from historical data
- ✅ Perfect JSON output format with all metadata

The accident component integrates with:
- **Neo4j Database**: Graph-based similarity search for historical accident retrieval ✅ **TESTED**
- **Traffic Accident Retriever Agent**: LLM-based accident risk assessment with retrieval grading ✅ **TESTED**
- **Multi-Model Pipeline**: Supports all models with automatic fallback handling ✅ **TESTED**

### Comprehensive Driving Assessment Component ✅ **FULLY IMPLEMENTED & TESTED**
The comprehensive driving assessment component synthesizes all analysis results to provide complete safety evaluation and improvement recommendations. It:

1. **Input**: Reads ground truth annotations with existing violation and accident analysis results
2. **Data Integration**: Uses existing violation detection and accident risk assessment data as input
3. **Comprehensive Analysis**: Applies DrivingMentor agent to generate holistic safety assessment
4. **Detailed Output**: Returns safety scores, strengths, weaknesses, and actionable improvement advice
5. **Models**: Uses text-only models from `evaluation/models/text.txt`

**Key Features**:
- **Holistic Assessment**: Combines annotation, violation, and accident analysis into comprehensive evaluation ✅ **VERIFIED**
- **Safety Scoring**: Provides 1-10 safety scores with detailed justification ✅ **VERIFIED**
- **Structured Feedback**: Categorizes findings into strengths, weaknesses, and improvement advice ✅ **VERIFIED**
- **Risk Classification**: Assigns risk levels (low, medium, high, critical) based on overall assessment ✅ **VERIFIED**
- **Gateway Model Support**: Intelligent text parsing fallback for models without structured output support ✅ **TESTED**
- **Fast Processing**: Efficient synthesis approach (~8-10 seconds per assessment) ✅ **VERIFIED**

**Test Results**:
- ✅ Successfully generates comprehensive assessments for ground truth files
- ✅ Safety scoring (3/10 in test case with detailed justification)
- ✅ Detailed strengths analysis (3 items: lane discipline, evasive action, controlled stop)
- ✅ Specific weaknesses identification (3 items: hazard anticipation, escape routes, reaction timing)
- ✅ Actionable improvement advice (4 detailed recommendations)
- ✅ Appropriate risk classification (critical level for collision scenario)

**Assessment Workflow**:
The assessment component represents the **final synthesis step** in the DriveGuard evaluation pipeline:
1. **Input Data**: Uses existing violation and accident results from ground truth
2. **Analysis Integration**: Combines all findings into coherent assessment
3. **Expert Evaluation**: Applies driving instructor perspective with safety scoring criteria
4. **Actionable Output**: Provides specific, implementable improvement recommendations

The assessment component integrates with:
- **Existing Analysis Results**: Uses violation and accident data from ground truth files ✅ **TESTED**
- **DrivingMentor Agent**: LLM-based comprehensive assessment with expert knowledge ✅ **TESTED**
- **Multi-Model Pipeline**: Supports all 22+ models with automatic fallback handling ✅ **TESTED**

## Video Processing

The tool automatically discovers and processes all MP4 files in the `data/dashcam/` directory. Currently supports 226 videos across various categories:

- Bridge/tunnel accidents
- Pedestrian conflicts  
- Weather conditions (snow, rain, fog)
- Night driving scenarios
- Road rage incidents
- Emergency vehicle situations
- Traffic violations
- Construction zones
- And more...

## Model Tracking

The tool includes comprehensive tracking of model usage and performance:

- Run history and statistics
- Generation times and success rates
- Model comparison data
- Error tracking and reporting

Tracking data is stored in `data/evaluation/tracking/model_tracking.json`.

## Status Monitoring

Use the `--status` flag to monitor progress:

```bash
# Overall status
uv run python -m evaluation.make_dataset.s5_generate_system_output --status

# Specific component/model status  
uv run python -m evaluation.make_dataset.s5_generate_system_output --status --component annotation --model gpt-4o-2024-11-20
```

This shows:
- Total videos found vs processed
- Completion percentages
- Output file validation
- Recent generation activity

## Gateway Model Compatibility

The tool provides comprehensive support for gateway models that don't support structured output:

### Automatic Detection
- Detects gateway models by `gateway:` prefix
- Automatically switches to text parsing for unsupported models
- Maintains structured output for compatible models (OpenAI, Groq)

### Intelligent Text Parsing
**Scene Component**:
- Extracts scenes from natural language responses
- Handles numbered lists, bullet points, and JSON embedded in text
- Fallback sentence splitting for unstructured responses

**Violation Component**:
- Parses yes/no responses for traffic rule relevance grading
- Extracts violation status (found/not_found) and detailed reasoning
- Handles both JSON and natural language violation explanations

### Supported Gateway Models
- Anthropic Claude (Opus, Sonnet)
- Google Gemini 2.5 Pro  
- Perplexity Sonar Pro
- Alibaba Qwen models
- Meta Llama models via gateway
- And many more through gateway routing

## Error Handling

The tool includes robust error handling:
- Automatic retry logic for failed generations
- Graceful handling of interrupted runs
- Detailed error logging and reporting
- Resume capability for partial runs
- Gateway model fallback for compatibility issues

## Integration

This tool integrates with the broader DriveGuard evaluation pipeline:

1. **s1_youtube_downloader**: Downloads raw videos
2. **s2_video_reviewer**: Reviews and marks clips  
3. **s3_extract_clips**: Extracts final dataset clips
4. **s4_annotate_ground_truth**: Creates ground truth annotations
5. **s5_generate_system_output**: ← This tool (generates system outputs)
6. **evaluation scripts**: Compare system outputs vs ground truth

## Development

### Adding New Components

To add support for a new component:

1. Create a new generator in `generators/` inheriting from `BaseSystemGenerator`
2. Implement the required abstract methods
3. Update the `create_generator()` function in `main.py`
4. Add the component to `AVAILABLE_COMPONENTS` in `config.py`

### Example New Generator

```python
from .base_generator import BaseSystemGenerator

class MyComponentGenerator(BaseSystemGenerator):
    def get_component_name(self) -> str:
        return "my_component"
    
    def generate_output(self, video_path: Path) -> Any:
        # Your component logic here
        return result
```

## Troubleshooting

**Import errors**: Ensure you're running from the project root with `uv run python -m ...`

**Missing videos**: Check that `data/dashcam/` contains MP4 files

**Model errors**: Verify API keys are set in `config/.env`

**Memory issues**: Reduce `--fps` or process fewer videos at once with `--videos`

## Requirements

- Python 3.12+
- UV package manager
- DriveGuard project dependencies
- Configured API keys for LLM models
- Sufficient disk space for outputs (JSON files are typically 5-50KB each)