# Tool Runner Three-Artifact Agent

## Overview

This agent demonstrates the **tool-runner pattern** where the agent itself is minimal, serving only to execute analysis tools and pass through their results. The heavy lifting is done by comprehensive Python tools that analyze the database schema, profile data patterns, discover relationships, and generate statistics.

## Architecture

### Three Artifacts

1. **eval_instructions.md** - Static SQL generation rules passed directly to the eval model
2. **agent.md** - Minimal agent that just executes tools
3. **tools/** - Comprehensive Python analysis tools

### Design Philosophy

- **Minimal Agent**: The agent is just 17 lines - it runs one command and returns output
- **Tool-Driven Analysis**: All intelligence is in the tools, not the agent
- **Comprehensive Output**: Tools generate complete analysis in structured JSON
- **No Interpretation**: Agent doesn't interpret or filter - just executes and returns

## Tools

### analyze_database.py
Main orchestrator that runs all analysis tools and combines results:
- Runs all tools in sequence
- Combines outputs into comprehensive analysis
- Generates both JSON and human-readable summaries
- Handles errors gracefully

### extract_schema.py
Extracts complete schema information:
- Table definitions and CREATE statements
- Column types, nullability, defaults
- Primary keys and foreign keys
- Indexes and views
- Row counts per table

### profile_data.py
Profiles data patterns and characteristics:
- Sample rows from each table
- Column-level analysis (null counts, unique values, patterns)
- Text pattern detection (emails, phones, IDs)
- Numeric analysis (ranges, integer vs decimal, currency detection)
- Date/time format detection

### find_relationships.py
Discovers and analyzes table relationships:
- Explicit foreign key relationships
- Inferred relationships from naming patterns
- Dependency ordering for joins
- Join pattern analysis (one-to-many, many-to-many, self-joins)
- Relationship graph visualization

### generate_statistics.py
Generates comprehensive database statistics:
- Database and table size metrics
- Column cardinality and selectivity
- Data quality issues (empty tables, high nulls, low cardinality)
- Performance optimization hints
- SQLite-specific recommendations

## Output Structure

The combined analysis provides:

```json
{
  "database_overview": {
    "description": "SQLite database analysis",
    "tables": ["table1", "table2"],
    "table_count": 2,
    "total_relationships": 5,
    "complexity": "medium"
  },
  "schema_details": {
    "tables": {
      "table1": {
        "columns": {...},
        "primary_keys": [...],
        "foreign_keys": [...],
        "row_count": 1000
      }
    }
  },
  "data_patterns": {
    "tables": {
      "table1": {
        "columns": {
          "col1": {
            "type": "TEXT",
            "null_percentage": 5.2,
            "sample_values": [...],
            "common_patterns": ["email"]
          }
        }
      }
    }
  },
  "relationships": {
    "foreign_keys": [...],
    "inferred_relationships": [...],
    "join_patterns": {...}
  },
  "statistics": {
    "database_size": {...},
    "table_statistics": {...},
    "data_quality": {...},
    "performance_hints": [...]
  },
  "sql_guidance": {
    "string_columns": {...},
    "date_columns": [...],
    "numeric_columns": [...],
    "recommendations": [...]
  }
}
```

## Advantages

1. **Reliability**: Python tools are deterministic and testable
2. **Completeness**: Tools analyze everything systematically
3. **Performance**: Tools run quickly without LLM overhead
4. **Debugging**: Easy to debug Python code vs LLM behavior
5. **Extensibility**: Easy to add new analysis tools

## Disadvantages

1. **Flexibility**: Can't adapt to unusual database patterns
2. **Context**: No natural language understanding of domain
3. **Evolution**: Harder for evolution to improve tool code
4. **Size**: Large output might overwhelm the eval model

## Usage

The agent executes with:
```bash
python tools/analyze_database.py
```

This runs all analysis tools and saves output to:
- `tool_output/analysis.json` - Complete structured analysis
- `tool_output/analysis_summary.txt` - Human-readable summary
- `tool_output/schema.json` - Schema details
- `tool_output/data_profile.json` - Data profiling results
- `tool_output/relationships.json` - Relationship analysis
- `tool_output/statistics.json` - Statistical analysis

## Evolution Potential

While the agent is minimal, evolution can:
- Add new analysis tools
- Modify tool parameters
- Change output structure
- Adjust analysis priorities
- Add domain-specific tools

## Best Use Cases

This pattern works best for:
- Well-structured databases with clear schemas
- Standard relationship patterns
- Databases where comprehensive analysis helps
- Situations where reliability is more important than adaptability

## Comparison to Other Approaches

| Approach | Strengths | Weaknesses |
|----------|-----------|------------|
| Tool Runner | Reliable, complete, fast | Inflexible, no context understanding |
| Pure LLM | Adaptive, understands context | Slower, less reliable, may miss details |
| Hybrid | Balance of both | More complex to implement |

## Future Enhancements

Potential improvements:
1. Add SQL query pattern analysis from past successes
2. Include domain-specific analysis tools
3. Add confidence scoring to inferred relationships
4. Generate SQL templates for common patterns
5. Add visualization of schema relationships