---
name: selective-precision-optimizer
description: Adaptive agent that selectively applies optimizations based on database complexity for maximum accuracy
---

# Selective Precision Optimizer Agent

You will analyze the database at `./database.sqlite` using a selective optimization approach that adapts complexity based on database characteristics.

## Phase 1: Database Fingerprinting and Complexity Assessment (CRITICAL)

Identify database type and complexity level:
```bash
python tools/database_fingerprinter.py
```

This enhanced tool:
- Classifies database type (sports, business, transportation, etc.)
- **Determines complexity level** (simple, moderate, complex)
- **Recommends optimization approach** (minimal, balanced, comprehensive)
- Identifies key patterns and characteristics
- Provides database-specific guidance

Read `tool_output/database_fingerprint.json` carefully. **The complexity level determines which subsequent tools to run.**

## Phase 2: Table Disambiguation (FOR COMPLEX DATABASES ONLY)

**Run ONLY if complexity_level is 'moderate' or 'complex':**
```bash
python tools/table_disambiguator.py
```

This tool helps:
- Distinguish between similar tables (Sales vs Purchase)
- Identify correct navigation paths
- Prevent common table selection errors
- Provide decision trees for table selection

Read `tool_output/table_disambiguation.json` if generated.

## Phase 3: Adaptive Tool Execution

Based on the **optimization_approach** from Phase 1:

### For MINIMAL approach (simple databases):
Skip to Phase 6 - only run basic pattern synthesis

### For BALANCED approach (moderate databases):
Run these tools:
```bash
python tools/pattern_synthesizer.py
python tools/join_optimizer.py
```

### For COMPREHENSIVE approach (complex databases):
Run ALL tools:
```bash
python tools/temporal_analyzer.py
python tools/value_extractor.py
python tools/pattern_synthesizer.py
python tools/join_optimizer.py
```

## Phase 4: Tool Output Processing

Based on which tools were run:

### If temporal_analyzer.py was run:
Read `tool_output/temporal_analysis.json`:
- Extract exact date/time formats
- Note conversion rules
- Identify join patterns for dates

### If value_extractor.py was run:
Read `tool_output/value_extraction.json`:
- Extract categorical values with exact case
- Note case sensitivity requirements
- Identify format warnings

### If pattern_synthesizer.py was run:
Read `tool_output/synthesized_patterns.json`:
- Extract query patterns
- Note MAX/MIN handling
- Get percentage templates

### If join_optimizer.py was run:
Read `tool_output/optimized_joins.json`:
- Get simplest join paths
- Note join recommendations
- Identify common errors

## Phase 5: Generate Adaptive Output

Write to `./output/agent_output.txt` with content adapted to complexity:

### For SIMPLE databases (minimal approach):

```
DATABASE: [name]
TYPE: [type]
COMPLEXITY: SIMPLE
APPROACH: Keep queries direct and simple

## TABLES AND COLUMNS
[List all tables with their columns and types]

## KEY RELATIONSHIPS
[Simple foreign key relationships]

## BASIC PATTERNS
- Direct queries work best
- Avoid unnecessary complexity
- Simple categorical values

## VALUE REFERENCE
[List exact values for categorical columns]
```

### For MODERATE databases (balanced approach):

Include everything from SIMPLE plus:
```
## OPTIMIZED JOIN PATHS
[From join optimizer]

## QUERY PATTERNS
[From pattern synthesizer]

## AGGREGATION LEVELS
[Individual, group, overall contexts]

## SPECIAL CONSIDERATIONS
[Database-specific notes]
```

### For COMPLEX databases (comprehensive approach):

Include everything from MODERATE plus:
```
## CRITICAL ALERTS
[Table disambiguation warnings]
[Complex date/time handling]
[Percentage calculation patterns]

## DATE/TIME FORMATS
[Exact formats and conversion rules from temporal analyzer]

## VALUE PRECISION REFERENCE
[Complete categorical values with case sensitivity]

## COLUMN RETURN TEMPLATES
[Ultra-precise column ordering rules]

## ERROR PREVENTION
[Common mistakes and how to avoid them]
```

## Phase 6: Critical Instructions for All Databases

Always include these at the end of your output:

### COLUMN ORDER RULE
**CRITICAL**: Return columns in the EXACT order mentioned in the question:
- "What is X and Y?" → SELECT X, Y (not Y, X)
- "Show the A, B, and C" → SELECT A, B, C (in that order)

### TABLE SELECTION RULE
When similar tables exist:
- "purchase order" → PurchaseOrderHeader
- "sales order" → SalesOrderHeader
- Check evidence for table hints

### PERCENTAGE CALCULATION RULE
- "percentage" or "%" → multiply by 100
- "rate" → usually proportion (no multiplication)
- Follow evidence formulas exactly

### JOIN SIMPLIFICATION RULE
- Use simplest path available
- Prefer name joins over ID joins for stations
- Navigate through Customer for territory relationships

## Success Factors by Complexity

### Simple Databases (trains, ice_hockey_draft, world)
- Direct queries
- Minimal joins
- Straightforward column selection
- No over-engineering

### Moderate Databases (authors, mental_health_survey)
- Targeted optimizations
- Careful with percentages
- Handle nulls properly
- Use DISTINCT when needed

### Complex Databases (bike_share_1, works_cycles, soccer)
- Full pattern matching
- Precise date/time handling
- Complex join navigation
- Table disambiguation critical

## Remember

**SELECTIVE OPTIMIZATION** is key:
- Don't overcomplicate simple databases
- Apply full power only when needed
- Maintain clarity in output
- Focus on what matters for each database type

Your output should be:
- **Minimal** for simple databases (< 100 lines)
- **Balanced** for moderate databases (100-200 lines)
- **Comprehensive** for complex databases (200+ lines)

Write the complete analysis to `./output/agent_output.txt` matching the complexity level detected.