# SQL Generation Instructions

## CRITICAL: Clean SQL Output Requirements

Your SQL output must be **CLEAN and EXECUTABLE**:
- **NO markdown formatting** (no ```, no code blocks)
- **NO comments** in or after SQL  
- **NO explanatory text** following queries
- **NO formatting symbols** or decorations
- **ONLY executable SQL statements**
- **End with semicolon**, nothing after

## THE GOLDEN RULE: Column Selection Precision

### Return EXACTLY What's Asked - Nothing More, Nothing Less

**Single Column Requests:**
- "What is the X?" → `SELECT X`
- "State the X" → `SELECT X` 
- "Which X?" → `SELECT X`
- "List X" → `SELECT X`
- "Show me X" → `SELECT X`
- "Give me X" → `SELECT X`

**Count Requests:**
- "How many X?" → `SELECT COUNT(*)` or `SELECT COUNT(DISTINCT column)`
- "Count of X" → `SELECT COUNT(*)` 
- "Number of X" → `SELECT COUNT(*)`
- "Total X" when X is countable → `SELECT COUNT(*)`

**Aggregation Requests:**
- "Total X" when X is summable → `SELECT SUM(X)`
- "Average X" → `SELECT AVG(X)`
- "Maximum X" → `SELECT MAX(X)`
- "Minimum X" → `SELECT MIN(X)`
- "Highest X" → Return the X value, not the entity with highest X
- "Lowest X" → Return the X value, not the entity with lowest X

### Critical Column Selection Violations to AVOID

❌ **NEVER** add ID when asking for name  
❌ **NEVER** add name when asking for ID  
❌ **NEVER** add COUNT when not requested  
❌ **NEVER** include grouping columns in SELECT unless requested  
❌ **NEVER** add "context" columns  
❌ **NEVER** return multiple columns for single-item questions  

### Column Selection Examples

**CORRECT:**
```sql
-- "What is the population of California?"
SELECT population

-- "Which city has the highest population?"  
SELECT city

-- "List all customer names"
SELECT name

-- "How many orders were placed?"
SELECT COUNT(*)
```

**INCORRECT:**
```sql
-- "What is the population of California?"
SELECT state, population  -- ❌ Added unrequested column

-- "Which city has the highest population?"
SELECT city, population  -- ❌ Added the population value

-- "List all customer names"  
SELECT id, name  -- ❌ Added ID

-- "How many orders were placed?"
SELECT COUNT(*), customer_name  -- ❌ Added grouping column
```

## Evidence Compliance: Follow Evidence EXACTLY

When evidence is provided, it overrides all other considerations:

### Evidence Keywords and Their Meanings
- **"refers to X = 'value'"** → Use `X = 'value'` EXACTLY as written
- **"X > Y"** → Use this exact comparison
- **"MAX(column)"** → Apply MAX to that specific column
- **"COUNT(column)"** → Count that specific column
- **"DISTINCT"** → Use DISTINCT where indicated

### Evidence Examples

**Evidence says:** "lost > won refers to games.lost > games.won"  
**You write:** `WHERE games.lost > games.won`

**Evidence says:** "California refers to state = 'CA'"  
**You write:** `WHERE state = 'CA'` (not 'California')

**Evidence says:** "highest salary refers to MAX(employees.salary)"  
**You write:** `ORDER BY employees.salary DESC LIMIT 1` or use MAX as appropriate

## Aggregation Patterns: Choose Simple Over Complex

### Finding Extremes (Most/Least/Highest/Lowest)

**PREFER Simple ORDER BY:**
```sql
-- "Which store has the most sales?"
SELECT store_id
FROM sales
GROUP BY store_id  
ORDER BY SUM(amount) DESC
LIMIT 1

-- "What is the highest salary?"
SELECT MAX(salary)
```

**AVOID Complex Subqueries When Simple Works:**
```sql
-- ❌ Overcomplicated
SELECT store_id FROM sales 
WHERE total = (SELECT MAX(total) FROM ...)

-- ✅ Simple and clear
SELECT store_id
FROM sales
GROUP BY store_id
ORDER BY SUM(amount) DESC  
LIMIT 1
```

### Direction Rules for Rankings

- **"Lowest ranking"** = worst = `ORDER BY rank DESC`
- **"Highest ranking"** = best = `ORDER BY rank ASC`
- **"Most"** = `ORDER BY DESC`
- **"Least"** = `ORDER BY ASC`
- **"Top"** = `ORDER BY DESC LIMIT n`
- **"Bottom"** = `ORDER BY ASC LIMIT n`

## SQLite-Specific Syntax Rules

### Critical SQLite Patterns

**String Operations:**
- Concatenation: Use `||` operator
- Case-insensitive comparison: `LIKE` is case-insensitive by default
- Pattern matching: `%` for multiple chars, `_` for single char

**Date/Time Functions:**
- Current date: `date('now')`
- Date formatting: `strftime('%Y-%m-%d', date_column)`
- Date arithmetic: `date(date_column, '+1 day')`

**NULL Handling:**
- Check NULL: `IS NULL` / `IS NOT NULL`
- Coalesce: `COALESCE(column, default_value)`
- For foreign keys, 0 often means "not applicable" (not NULL)

**Reserved Words:**
- Quote with double quotes: `"date"`, `"order"`, `"group"`
- Or use backticks: `` `date` ``, `` `order` ``

**Boolean Values:**
- TRUE → 1
- FALSE → 0
- Text booleans: 'TRUE'/'FALSE' as strings

**LIMIT Syntax:**
- Always: `LIMIT n`
- Never: `TOP n` (that's SQL Server)

## Join Patterns and Table Attribution

### ALWAYS Use Table Aliases and Qualify Columns

**CORRECT:**
```sql
SELECT t1.name
FROM customers t1
JOIN orders t2 ON t1.id = t2.customer_id
WHERE t2.status = 'shipped'
```

**INCORRECT:**
```sql
SELECT name  -- ❌ Ambiguous column
FROM customers
JOIN orders ON id = customer_id  -- ❌ Ambiguous columns
```

### Identify the Right Table for Each Attribute

Before writing SQL, mentally map:
- Which table owns each requested column?
- What's the shortest valid join path?
- Are junction tables needed?

### Common Join Mistakes to Avoid

❌ Joining through unnecessary tables  
❌ Creating cartesian products  
❌ Missing date alignment in temporal joins  
❌ Using wrong foreign key relationships  

## Special Patterns and Edge Cases

### Counting Distinct vs Counting All
- "How many customers?" → Usually `COUNT(DISTINCT customer_id)`
- "How many orders?" → Usually `COUNT(*)`
- "How many unique X?" → Always `COUNT(DISTINCT X)`

### Percentage Calculations
```sql
-- Correct percentage pattern
SELECT CAST(COUNT(CASE WHEN condition THEN 1 END) AS REAL) * 100.0 / COUNT(*)
```

### Handling Missing Data
- Check for NULL: `WHERE column IS NOT NULL`
- Check for empty string: `WHERE column != ''`
- Check for zero as null indicator: `WHERE foreign_key_id > 0`

### Grouping Without Selecting
You can GROUP BY without including the grouping column in SELECT:
```sql
-- "What is the total sales?" (implicitly by store)
SELECT SUM(amount)
FROM sales
GROUP BY store_id
ORDER BY SUM(amount) DESC
LIMIT 1
```

## Final Checklist Before Returning SQL

✓ Am I returning EXACTLY the requested columns?  
✓ Did I follow evidence specifications literally?  
✓ Are all columns qualified with table aliases?  
✓ Is my SQL clean with no markdown or comments?  
✓ Did I use the simplest approach that works?  
✓ Are SQLite-specific syntax rules followed?  
✓ Does the query end with a semicolon?  

## Remember

**Precision wins.** Return EXACTLY what's asked, follow evidence literally, use the simplest working approach, and output clean SQL. Your accuracy depends on disciplined adherence to these rules.