# SQL Generation Instructions

## CRITICAL OUTPUT REQUIREMENTS

Your SQL must be **CLEAN and EXECUTABLE**:
- **NO markdown formatting** (no ```, no code blocks)
- **NO comments** in or after SQL
- **NO explanatory text** following queries
- **NO formatting symbols** or decorations
- **ONLY executable SQL statements**
- **End with semicolon**, nothing after

## THE FIVE PILLARS OF ACCURATE SQL

### Pillar 1: Ultra-Strict Column Selection (40% of accuracy)

**THE GOLDEN RULE**: If the question asks for ONE thing, return EXACTLY ONE column.

#### Single Answer Patterns
- "What is the X?" → `SELECT X`
- "State the X" → `SELECT X`
- "Which X?" → `SELECT X`
- "List X" → `SELECT X`
- "Show the X" → `SELECT X`

#### Count Patterns
- "How many X?" → `SELECT COUNT(*)`
- "Number of X" → `SELECT COUNT(*)`
- "Total X" (countable) → `SELECT COUNT(*)`
- "How many unique X?" → `SELECT COUNT(DISTINCT X)`

#### Aggregation Patterns
- "Total X" (summable) → `SELECT SUM(X)`
- "Average X" → `SELECT AVG(X)`
- "Maximum X" → `SELECT MAX(X)`
- "Minimum X" → `SELECT MIN(X)`

#### Critical Violations to AVOID
❌ Adding ID when asking for name
❌ Including name when asking for ID
❌ Adding COUNT when not requested
❌ Including grouping columns unless requested
❌ Returning multiple columns for single-item questions
❌ Adding "context" columns

### Pillar 2: Table Attribution Clarity (25% of accuracy)

**Before writing SQL, identify:**
- Which table owns each requested column?
- What's the correct table for WHERE conditions?
- Are you using the right table's version of a column?

**Common Attribution Errors:**
- Using coaches.wins instead of teams.wins
- Mixing player stats from wrong table
- Confusing regular season vs playoff tables
- Using junction table columns instead of main table

### Pillar 3: Simplified Query Patterns (20% of accuracy)

**PREFER Simple Approaches:**

For Finding Extremes:
```sql
-- ✅ Simple ORDER BY
SELECT column
FROM table
ORDER BY metric DESC
LIMIT 1

-- ❌ Complex subquery
SELECT column FROM table 
WHERE metric = (SELECT MAX(metric)...)
```

**Direction Rules:**
- "Most/Highest" → `ORDER BY DESC`
- "Least/Lowest" → `ORDER BY ASC`
- "Best ranking" → `ORDER BY rank ASC` (1 is best)
- "Worst ranking" → `ORDER BY rank DESC`
- "Top N" → `ORDER BY metric DESC LIMIT N`

### Pillar 4: Evidence Compliance (10% of accuracy)

**When evidence is provided, follow it EXACTLY:**

Evidence Patterns:
- "refers to X = 'value'" → Use `X = 'value'` literally
- "X > Y" → Use exact comparison
- "calculated as formula" → Apply formula exactly
- "MAX(column)" → Use MAX on that specific column

**Evidence Override Rule**: Evidence specifications override optimization preferences

### Pillar 5: String Matching Strategy (5% of accuracy)

**Default String Matching Rules:**
- Names (person/company/product) → Use `LIKE`
- Descriptions/text fields → Use `LIKE`
- Status/category values → Try `LIKE` first
- Codes/IDs → Use `=` (exact match)
- Enums/constants → Use `=`

**Remember:** SQLite LIKE is case-insensitive by default

## JOIN PATTERNS AND BEST PRACTICES

### ALWAYS Use Aliases and Qualify Columns

**CORRECT:**
```sql
SELECT t1.name
FROM customers t1
JOIN orders t2 ON t1.id = t2.customer_id
WHERE t2.status = 'shipped'
```

**INCORRECT:**
```sql
SELECT name  -- ❌ Ambiguous
FROM customers
JOIN orders ON id = customer_id  -- ❌ Ambiguous
```

## SQLite-SPECIFIC SYNTAX

### Critical SQLite Functions
- String concat: `||` not CONCAT()
- Date extraction: `STRFTIME('%Y', date)` not YEAR()
- String length: `LENGTH()` not LEN()
- Null handling: `COALESCE()` not ISNULL()
- Limit rows: `LIMIT n` not TOP n
- Boolean values: 1/0 not TRUE/FALSE

### Date Patterns
- Extract year: `STRFTIME('%Y', date_column)`
- Extract month: `STRFTIME('%m', date_column)`
- Date arithmetic: `DATE(date_column, '+1 day')`
- Current date: `DATE('now')`

## SPECIAL PATTERNS AND EDGE CASES

### Percentage Calculations
```sql
-- Correct percentage pattern
SELECT CAST(COUNT(CASE WHEN condition THEN 1 END) AS REAL) * 100.0 / COUNT(*)
```

### Counting Patterns
- "How many customers?" → `COUNT(DISTINCT customer_id)`
- "How many orders?" → `COUNT(*)`
- "How many different X?" → `COUNT(DISTINCT X)`

### Grouping Without Selecting
```sql
-- "What is the highest total?" (by group)
SELECT SUM(amount)
FROM sales
GROUP BY store_id
ORDER BY SUM(amount) DESC
LIMIT 1
```

## PRE-QUERY CHECKLIST

Before generating SQL, verify:
✓ How many columns should I return?
✓ Which table owns each column?
✓ Should I use LIKE or = for strings?
✓ Is there evidence to follow exactly?
✓ Are all columns qualified with aliases?
✓ Am I using SQLite syntax?
✓ Is this the simplest approach?

## COMMON FAILURE PATTERNS TO AVOID

1. **Extra Columns** - Adding unrequested columns "for context"
2. **Wrong String Match** - Using = when LIKE needed for names
3. **Complex Query** - Nested subqueries when ORDER BY works
4. **Wrong Table** - Using wrong table's version of a column
5. **Missing Aliases** - Ambiguous columns in joins
6. **Wrong Syntax** - Using non-SQLite functions

## FINAL REMINDERS

**Success Formula:**
1. Count required columns
2. Identify correct tables
3. Choose simplest pattern
4. Use LIKE for names/text
5. Follow evidence exactly
6. Return ONLY what's asked

**Remember:** Precision wins. Return EXACTLY what's asked, use the simplest approach, follow evidence literally, and output clean SQL.