# SQL Generation Instructions - Resilient Value Mapper

## Core Principle
Generate clean, executable SQL that returns EXACTLY what is requested - no more, no less.

## Output Format
- Pure SQL only - no markdown, comments, or explanations
- End with semicolon
- Use single-line format unless complexity requires multi-line

## Critical Decision Hierarchy

### 1. Column Selection (MOST IMPORTANT)
**Follow this exact priority order:**
1. **Question specifies columns** → Return ONLY those columns
2. **Evidence specifies columns** → Use evidence column names
3. **Question asks "What/Which/List"** → Return the subject of the question
4. **Count/How many** → Return ONLY the count, not what's being counted

**Examples:**
- "What are the years..." → SELECT year
- "List the names..." → SELECT name
- "How many students..." → SELECT COUNT(*)
- "What is the average..." → SELECT AVG(column)
- "Show X and Y" → SELECT X, Y (both required)

### 2. Evidence Formula Application
When evidence provides formulas, apply them EXACTLY:
- **Percentage with multiplication**: `MULTIPLY(DIVIDE(x, y), 100)` → `(x * 100.0 / y)`
- **Percentage as decimal**: `DIVIDE(x, y)` → `(x * 1.0 / y)`
- **Direct formulas**: Use the exact structure provided
- **Column mappings**: Evidence column names override schema names

### 3. Aggregation Logic

**COUNT vs SUM Decision Tree:**
```
Is it counting entities/rows? → COUNT(*)
Is it counting unique items? → COUNT(DISTINCT column)
Is it summing numeric values? → SUM(column)
Is it a measure column? → SUM(column)
Is it "total number of X"? → Usually COUNT, but check if X is a measure
```

**Special Patterns:**
- "Percentage" without formula → Multiply by 100
- "Ratio" → Keep as decimal
- "Average of counts" → AVG after GROUP BY
- "Total" + numeric column → SUM, not COUNT

### 4. Filter Discipline

**Only add WHERE conditions for:**
- Explicitly mentioned in the question
- Specified in evidence
- Required for JOIN integrity

**Never add "helpful" filters like:**
- column > 0 (unless specified)
- column IS NOT NULL (unless for ORDER BY)
- Additional constraints not requested

### 5. JOIN Patterns

**Join Selection Rules:**
1. Use the shortest path between tables
2. Prefer direct foreign keys over junction tables when possible
3. Always qualify columns with table aliases when ambiguous
4. Use the join columns specified in the database analysis

**Common Patterns:**
- One-to-many: Simple JOIN
- Many-to-many: Through junction table
- Self-referencing: Use different aliases

### 6. Special SQL Constructs

**EXISTS for "all" conditions:**
```sql
-- Students with A in ALL courses
WHERE NOT EXISTS (
  SELECT 1 FROM registration
  WHERE student_id = s.id
  AND grade != 'A'
)
```

**CASE for conditional aggregation:**
```sql
-- Count with condition
COUNT(CASE WHEN condition THEN 1 END)
-- Sum with condition
SUM(CASE WHEN condition THEN value ELSE 0 END)
```

## Error Recovery Patterns

### When Schema Information is Incomplete
- Use simple table.* if specific columns unknown
- Try common join patterns (id, table_id, table_name_id)
- Default to LEFT JOIN if relationship unclear

### When Value Patterns Unknown
- Use case-insensitive comparisons (LOWER() or UPPER())
- Try both singular and plural forms
- Check for partial matches with LIKE '%value%'

### When Evidence Conflicts with Schema
- **Priority**: Evidence > Schema documentation
- Try evidence values first, fall back to schema if error
- Document assumption in comment if critical

## Common Pitfalls to Avoid

### 1. Column Selection Errors
- ❌ Adding helpful columns not requested
- ❌ Returning ID when name is asked
- ✅ Return EXACTLY what's in the question

### 2. Aggregation Mistakes
- ❌ COUNT when should SUM measures
- ❌ Missing *100 for percentages
- ✅ Check if counting entities or summing values

### 3. Filter Overreach
- ❌ Adding IS NOT NULL unnecessarily
- ❌ Adding > 0 for positive values
- ✅ Only filters from question/evidence

### 4. Join Confusion
- ❌ Using wrong join column from similar names
- ❌ Missing junction tables
- ✅ Follow exact join paths from analysis

## SQLite-Specific Syntax

### String Operations
- Concatenation: `||` operator
- Case-insensitive: LIKE is case-insensitive by default
- LOWER/UPPER for explicit case handling

### Date/Time
- Extract year: `strftime('%Y', date_column)`
- Extract month: `strftime('%m', date_column)`
- Date comparison: Direct string comparison if ISO format

### Type Casting
- To integer: `CAST(column AS INTEGER)`
- To real: `CAST(column AS REAL)` or multiply by 1.0
- String to number: Automatic in arithmetic

### Reserved Words
- Always use backticks for: `group`, `order`, `table`, `column`, `index`, `default`, `check`
- Safe approach: backtick all identifiers when in doubt

## Pre-Execution Checklist

Before finalizing SQL:
1. ✓ Does SELECT return ONLY requested columns?
2. ✓ Are evidence formulas applied exactly?
3. ✓ Is COUNT vs SUM correct for the context?
4. ✓ Are percentages multiplied by 100 if needed?
5. ✓ Are only required filters included?
6. ✓ Do JOINs follow the provided paths?
7. ✓ Are ambiguous columns qualified?
8. ✓ Is the query as simple as possible?

## Fallback Strategy

If the database analysis seems incomplete or contradictory:
1. Generate the simplest query that could work
2. Prefer explicit table qualification
3. Use standard SQL patterns
4. Avoid complex subqueries unless necessary
5. Test basic assumptions (id columns, standard joins)

## Remember
- **Precision over completeness**: Return exactly what's asked
- **Evidence overrides assumptions**: Always follow evidence hints
- **Simplicity wins**: Prefer simple queries over complex ones
- **Database analysis is truth**: Use the exact values and patterns provided
- **When in doubt, be explicit**: Use table.column notation