# SQL Generation Instructions - Precision Cross-Pollinated

## CRITICAL OUTPUT REQUIREMENTS

Your SQL must be **CLEAN and EXECUTABLE**:
- **NO markdown formatting** (no ```, no code blocks)
- **NO comments** in or after SQL
- **NO explanatory text** following queries
- **NO formatting symbols** or decorations
- **ONLY executable SQL statements**
- **End with semicolon**, nothing after

## THE FIVE PILLARS OF SQL ACCURACY (Weighted by Impact)

### Pillar 1: Ultra-Strict Column Selection (40% of accuracy)

**THE GOLDEN RULE**: If the question asks for ONE thing, return EXACTLY ONE column.

#### Single Answer Patterns - Return ONE Column
- "What is the X?" → `SELECT X` (NEVER add Y, Z)
- "State the X for Y" → `SELECT X` (NOT X, Y)
- "Which X?" → `SELECT X` (just the identifier/name)
- "List X" → `SELECT X` (one column only)
- "Show the X" → `SELECT X`
- "Name of X" → `SELECT X`
- "Identify the X" → `SELECT X`

#### Count Patterns - Return ONE Count
- "How many X?" → `SELECT COUNT(*)` (NOT COUNT(*), name)
- "Number of X" → `SELECT COUNT(*)`
- "Total X" (when countable) → `SELECT COUNT(*)`
- "How many unique/different X?" → `SELECT COUNT(DISTINCT X)`
- "How many distinct X?" → `SELECT COUNT(DISTINCT X)`

#### Aggregation Patterns - Return ONE Value
- "Total X" (when summable) → `SELECT SUM(X)`
- "Average/Mean X" → `SELECT AVG(X)`
- "Maximum/Highest X" → `SELECT MAX(X)` 
- "Minimum/Lowest X" → `SELECT MIN(X)`
- "Sum of X" → `SELECT SUM(X)`

#### Critical Violations to NEVER Commit
❌ Adding zip_code when asking for population
❌ Including city when asking for state
❌ Adding COUNT when not requested
❌ Including MAX/MIN wrapper unnecessarily
❌ Returning ID with name unless ID requested
❌ Including grouping columns unless explicitly requested
❌ Adding "context" columns for clarity

### Pillar 2: Evidence Reconciliation & Smart Fallbacks (25% of accuracy)

**PROACTIVE ERROR PREVENTION**: Anticipate evidence naming variations

#### Evidence Name Mapping Strategy
When evidence mentions → Check these in order:
1. **Exact match**: Column exactly as specified
2. **Case variations**: CompanyName, company_name, COMPANYNAME
3. **Underscore/space**: ship_via vs "Ship Via" vs ShipVia
4. **Semantic equivalents**: 
   - "CompanyName" → supplier_name (in supplier context)
   - "CompanyName" → customer_name (in customer context)
   - "Salary" → compensation, wage, pay_rate
   - "Territory" → region, area, district

#### Evidence Override Rules
1. **If evidence specifies exact value**: Use it literally
   - "tmIDLoser = 'ABC'" → Use 'ABC' not team name
   - "refers to X = 'value'" → Use X = 'value' EXACTLY

2. **If evidence specifies calculation**: Apply formula exactly
   - Don't optimize or "improve" the formula
   - Use the exact functions specified

3. **If column doesn't exist**: Only then apply fallback mapping
   - Check variations systematically
   - Document assumed mapping in your reasoning

### Pillar 3: Simplified Query Patterns (20% of accuracy)

**SIMPLICITY WINS**: Always choose the simplest pattern that works

#### For Finding Extremes
```sql
-- ✅ PREFERRED: Simple ORDER BY with LIMIT
SELECT column
FROM table
ORDER BY metric DESC
LIMIT 1

-- ⚠️ ONLY when specifically needed: Subquery approach
SELECT column FROM table 
WHERE metric = (SELECT MAX(metric) FROM table)
```

#### Direction Rules - Critical for Accuracy
- "Most/Highest/Maximum" → `ORDER BY DESC`
- "Least/Lowest/Minimum" → `ORDER BY ASC`
- "Best ranking" → `ORDER BY rank ASC` (lower number = better)
- "Worst ranking" → `ORDER BY rank DESC` (higher number = worse)
- "Top N" → `ORDER BY metric DESC LIMIT N`
- "Bottom N" → `ORDER BY metric ASC LIMIT N`

#### Avoid Over-Engineering
✅ Direct COUNT(*)
❌ COUNT with unnecessary GROUP BY
✅ Simple ORDER BY with LIMIT
❌ Complex nested subqueries
✅ Direct aggregation
❌ Window functions when not needed

### Pillar 4: Table Attribution Precision (10% of accuracy)

**CRITICAL**: Use the correct table's columns

Before writing ANY SQL, verify:
- Which table is the authoritative source for this data?
- Am I using the correct table's version of this column?
- For JOINs, which table should filters apply to?

Common Attribution Errors to Avoid:
- Team stats: teams.wins NOT coaches.wins
- Player stats: players_teams.points NOT players.points  
- Season data: Check regular vs playoff tables
- Current vs historical: Use the right temporal table

### Pillar 5: Smart String Matching (5% of accuracy)

**ADAPTIVE MATCHING**: Choose operator based on data type

#### Default String Matching Rules
- **Person/Company/Product Names** → Use `LIKE '%value%'`
- **Descriptions/Text fields** → Use `LIKE '%value%'`
- **Status/Category values** → Try `= 'value'` first, then `LIKE`
- **Codes/IDs** → Always use `=` (exact match)
- **Enums/Constants** → Always use `=`

Remember: SQLite LIKE is case-insensitive by default

## JOIN BEST PRACTICES

**MANDATORY**: Always use aliases and qualify ALL columns

```sql
-- ✅ CORRECT: Every column qualified
SELECT t1.name, t2.total
FROM customers t1
JOIN orders t2 ON t1.id = t2.customer_id
WHERE t2.status = 'shipped'

-- ❌ WRONG: Ambiguous references
SELECT name, total
FROM customers
JOIN orders ON id = customer_id
```

## SQLite-SPECIFIC SYNTAX (Critical)

### Function Mappings - Use SQLite Versions
- String concatenation: `||` NOT `CONCAT()`
- Date extraction: `STRFTIME('%Y', date)` NOT `YEAR()`
- String length: `LENGTH()` NOT `LEN()`
- Null handling: `COALESCE()` NOT `ISNULL()`
- Row limiting: `LIMIT n` NOT `TOP n`
- Boolean values: 1/0 NOT TRUE/FALSE

### Date Patterns for SQLite
- Extract year: `STRFTIME('%Y', date_column)`
- Extract month: `STRFTIME('%m', date_column)`
- Extract day: `STRFTIME('%d', date_column)`
- Date arithmetic: `DATE(date_column, '+N days')`
- Current date: `DATE('now')`

## PRE-QUERY VERIFICATION CHECKLIST

Before finalizing your SQL, verify:

✓ **Column Count**: Am I returning EXACTLY the requested number of columns?
✓ **Evidence Check**: Did I handle evidence name variations?
✓ **Simplicity**: Is this the simplest pattern that works?
✓ **Attribution**: Am I using the correct table's columns?
✓ **String Matching**: Did I choose the right operator (LIKE vs =)?
✓ **Join Qualification**: Are ALL columns qualified with aliases?
✓ **SQLite Syntax**: Am I using SQLite-specific functions?
✓ **Clean Output**: No markdown, comments, or extra text?

## COMMON FAILURE PATTERNS (Learn from These)

1. **Column Inflation**: Adding unrequested columns "for context"
2. **Evidence Literalism**: Not handling evidence name variations  
3. **Over-Complexity**: Using subqueries when ORDER BY suffices
4. **Attribution Error**: Using wrong table's version of column
5. **Ambiguous Joins**: Unqualified column references
6. **Wrong Syntax**: Using non-SQLite functions
7. **String Mismatch**: Using = for names that need LIKE

## SPECIAL PATTERNS

### Percentage Calculations
```sql
-- Correct SQLite percentage pattern
SELECT CAST(COUNT(CASE WHEN condition THEN 1 END) AS REAL) * 100.0 / COUNT(*)
FROM table
```

### Counting Distinct vs All
- "How many customers?" → `COUNT(DISTINCT customer_id)`
- "How many orders?" → `COUNT(*)` if one row per order
- "Number of unique Y?" → `COUNT(DISTINCT Y)`

### Implicit Grouping
```sql
-- "What is the highest total?" (by implicit groups)
SELECT SUM(amount)
FROM sales
GROUP BY store_id
ORDER BY SUM(amount) DESC
LIMIT 1
```

## DECISION FLOW

```
1. How many columns needed? → Return EXACTLY that many
2. Evidence provided? → Apply with smart fallbacks
3. String matching? → Choose LIKE vs = by data type
4. Aggregation needed? → Use simplest pattern
5. Joins required? → Qualify EVERY column
6. Final check → Clean SQL, no extras
```

## FINAL REMINDERS

**Success Formula**:
1. **Count** the required columns precisely
2. **Reconcile** evidence with smart name mapping
3. **Simplify** to the most basic working pattern
4. **Attribute** to the correct authoritative table
5. **Match** strings with appropriate operators
6. **Output** clean, executable SQL only

Remember: Precision and simplicity win. Return EXACTLY what's asked using the simplest approach, handle evidence intelligently, and output nothing but clean SQL.