# SQL Generation Instructions - Error Prevention Focus

## CRITICAL: Evidence Takes Priority
When evidence is provided, follow it EXACTLY:
- Use column names mentioned in evidence verbatim
- Apply formulas exactly as specified
- If evidence conflicts with these instructions, follow evidence

## Core Rules

### 1. Column Selection - Be Minimal
**Return ONLY what's explicitly requested:**
- "What is X?" → SELECT X (single column)
- "List the Y" → SELECT Y (single column)
- "Show A and B" → SELECT A, B (exactly two columns)
- "Give full name" → Concatenate if first/last are separate
- Never add ID columns unless requested
- Never add extra context columns

### 2. JOIN Strategy - Match Cardinality
**Choose JOIN type based on question intent:**
- Use INNER JOIN by default for "matching" records
- Use LEFT JOIN only when including "all" from left table
- Always use table aliases (short, like t1, t2)
- Qualify ALL columns with aliases
- Check for junction tables (many-to-many relationships)

### 3. Aggregation Rules - Scope Matters
**COUNT questions:**
- "How many X?" → COUNT(*) or COUNT(DISTINCT x_id)
- "Number of Y per X" → GROUP BY X
- "Total" → SUM(column)
- "Average" → AVG(column)
- Use COUNT(DISTINCT ...) when counting unique entities
- Include GROUP BY for all non-aggregated columns

### 4. WHERE Clause - Operator Selection
**Match patterns correctly:**
- Exact match: column = 'value'
- Partial match: column LIKE '%pattern%'
- Null check: column IS NULL or IS NOT NULL
- Multiple values: column IN (value1, value2)
- Range: column BETWEEN x AND y

### 5. SQLite Specific Patterns
- String concatenation: column1 || ' ' || column2
- Case insensitive: LIKE is case-insensitive by default
- Date functions: date(), datetime(), strftime()
- Integer division: CAST(x AS REAL) for decimals

## Error Prevention Checklist

### Before Writing SQL:
1. **Identify requested columns** - What exactly should be returned?
2. **Check evidence** - Any column names or formulas provided?
3. **Determine joins** - Which tables needed? Junction tables?
4. **Plan aggregation** - COUNT/SUM/AVG? Need GROUP BY?
5. **Set filters** - WHERE conditions from question/evidence

### Common Mistakes to Avoid:
- ❌ Returning extra columns "for context"
- ❌ Using LEFT JOIN when INNER JOIN suffices
- ❌ Missing GROUP BY with aggregation
- ❌ Using = for partial string matches
- ❌ Ignoring evidence-provided formulas
- ❌ Adding ORDER BY unless explicitly requested
- ❌ Including LIMIT unless quantity specified

## Special Cases

### Multiple Questions in One:
Split into parts but return as single result:
- "What is X and which Y has most Z?"
- First get X, then identify Y with MAX(Z)
- May need subqueries or CTEs

### Percentages:
- Formula: (COUNT(condition) * 100.0 / COUNT(*))
- Always use 100.0 for decimal results
- CAST counts as REAL if needed

### "Both/All" Conditions:
- "In both A and B" → Use HAVING COUNT(DISTINCT category) = 2
- "All of these" → Verify all conditions met

## Output Format
- Clean SQL only
- No markdown formatting
- No comments
- End with semicolon
- One statement only (use CTEs if complex)

## Final Reminder
**When in doubt:**
1. Follow evidence literally
2. Return minimal columns
3. Use simple, direct approach
4. Test your logic: Will this return exactly what was asked?