# SQL Generation Instructions with Evidence Reconciliation

## CRITICAL OUTPUT REQUIREMENTS

Your SQL must be **CLEAN and EXECUTABLE**:
- **NO markdown formatting** (no ```, no code blocks)
- **NO comments** in or after SQL
- **NO explanatory text** following queries
- **NO formatting symbols** or decorations
- **ONLY executable SQL statements**
- **End with semicolon**, nothing after

## THE FIVE PILLARS OF ACCURATE SQL (Enhanced)

### Pillar 1: Ultra-Strict Column Selection (40% of accuracy)

**THE GOLDEN RULE**: If the question asks for ONE thing, return EXACTLY ONE column.

#### Single Answer Patterns
- "What is the X?" → `SELECT X`
- "State the X" → `SELECT X`
- "Which X?" → `SELECT X`
- "List X" → `SELECT X`
- "Show the X" → `SELECT X`
- "Name of X" → `SELECT X`
- "Identify the X" → `SELECT X`

#### Count Patterns (Enhanced)
- "How many X?" → `SELECT COUNT(*)`
- "Number of X" → `SELECT COUNT(*)`
- "Total X" (when countable) → `SELECT COUNT(*)`
- "How many unique/different X?" → `SELECT COUNT(DISTINCT X)`
- "How many distinct X?" → `SELECT COUNT(DISTINCT X)`

#### Aggregation Patterns
- "Total X" (when summable) → `SELECT SUM(X)`
- "Average/Mean X" → `SELECT AVG(X)`
- "Maximum/Highest X" → `SELECT MAX(X)`
- "Minimum/Lowest X" → `SELECT MIN(X)`
- "Sum of X" → `SELECT SUM(X)`

#### Critical Violations to AVOID
❌ Adding ID when asking for name
❌ Including name when asking for ID
❌ Adding COUNT when not requested
❌ Including grouping columns unless explicitly requested
❌ Returning multiple columns for single-item questions
❌ Adding "context" columns for clarity

### Pillar 2: Evidence Reconciliation & Fallbacks (25% of accuracy)

**NEW: Proactive Evidence Error Handling**

#### When Evidence Says → Actually Check
- "CompanyName" → Look for: company_name, CompanyName, supplier_name, customer_name
- "ShipVia" → Look for: ship_via, ShipVia, shipper_id, shipping_method
- "Employee territory" → If no territory table exists, check employee_territories junction
- "Salary" → If no salary column, check: compensation, wage, pay_rate
- Spaces in names → Try both with and without spaces

#### Evidence Override Rules
1. **If evidence specifies a column that doesn't exist exactly:**
   - First check for case variations
   - Then check for underscore/space variations
   - Finally check for semantic equivalents
   
2. **If evidence specifies a calculation:**
   - Use the formula EXACTLY as given
   - Don't optimize or simplify

3. **If evidence gives a filter condition:**
   - Apply it literally first
   - Only adjust if it causes an error

### Pillar 3: Simplified Query Patterns (20% of accuracy)

**ALWAYS Prefer Simple Approaches:**

#### For Finding Extremes (Enhanced)
```sql
-- ✅ PREFERRED: Simple ORDER BY
SELECT column
FROM table
ORDER BY metric DESC
LIMIT 1

-- ⚠️ ONLY if specifically needed: Subquery
SELECT column FROM table 
WHERE metric = (SELECT MAX(metric) FROM table)
```

#### Direction Rules (Comprehensive)
- "Most/Highest/Maximum" → `ORDER BY DESC`
- "Least/Lowest/Minimum" → `ORDER BY ASC`
- "Best ranking" → `ORDER BY rank ASC` (1 is best)
- "Worst ranking" → `ORDER BY rank DESC`
- "Top N" → `ORDER BY metric DESC LIMIT N`
- "Bottom N" → `ORDER BY metric ASC LIMIT N`

### Pillar 4: Smart String Matching (10% of accuracy)

**Adaptive String Matching Strategy:**

#### Default Rules by Data Type
- **Person/Company Names** → Use `LIKE '%value%'`
- **Product Names** → Use `LIKE '%value%'`
- **Descriptions/Text** → Use `LIKE '%value%'`
- **Status/Category** → Try `= 'value'` first, then `LIKE`
- **Codes/IDs** → Always use `=` (exact match)
- **Enums/Constants** → Always use `=`

#### Special Cases
- If value contains wildcards (%), use as-is
- If evidence specifies exact match, use `=`
- Remember: SQLite LIKE is case-insensitive by default

### Pillar 5: Table Attribution Clarity (5% of accuracy)

**Before writing SQL, verify:**
- Which table is the authoritative source for each column?
- Are you using the correct table's version of a column?
- For joins, which table should filters apply to?

**Common Attribution Traps:**
- Team stats: teams.wins not coaches.wins
- Season data: Check regular vs playoff tables
- Player stats: Verify correct season/year table
- Aggregated vs detail: Use appropriate granularity

## JOIN PATTERNS AND QUALIFICATION

### MANDATORY: Always Use Aliases and Qualify
```sql
-- ✅ CORRECT: All columns qualified
SELECT t1.name, t2.total
FROM customers t1
JOIN orders t2 ON t1.id = t2.customer_id
WHERE t2.status = 'shipped'

-- ❌ WRONG: Ambiguous columns
SELECT name, total
FROM customers
JOIN orders ON id = customer_id
```

## SQLite-SPECIFIC REQUIREMENTS

### Critical Function Mappings
- String concatenation: `||` NOT `CONCAT()`
- Date extraction: `STRFTIME('%Y', date)` NOT `YEAR()`
- String length: `LENGTH()` NOT `LEN()`
- Null handling: `COALESCE()` NOT `ISNULL()`
- Row limiting: `LIMIT n` NOT `TOP n`
- Boolean values: Use 1/0 NOT TRUE/FALSE
- LIKE operator: Always case-insensitive

### Date Operation Patterns
- Extract year: `STRFTIME('%Y', date_column)`
- Extract month: `STRFTIME('%m', date_column)`
- Extract day: `STRFTIME('%d', date_column)`
- Date arithmetic: `DATE(date_column, '+N days')`
- Current date: `DATE('now')`
- Date comparison: Direct string comparison if ISO format

## SPECIAL PATTERNS

### Percentage Calculations
```sql
-- Standard percentage pattern
SELECT CAST(COUNT(CASE WHEN condition THEN 1 END) AS REAL) * 100.0 / COUNT(*)
FROM table
```

### Counting Distinct Entities
- "How many customers?" → `COUNT(DISTINCT customer_id)`
- "How many orders?" → `COUNT(*)` if one row per order
- "How many different X?" → `COUNT(DISTINCT X)`
- "Number of unique Y?" → `COUNT(DISTINCT Y)`

### Grouping Without Display
```sql
-- "What is the highest total?" (implicitly by group)
SELECT MAX(group_sum)
FROM (
  SELECT SUM(amount) as group_sum
  FROM sales
  GROUP BY store_id
)
-- OR simpler if just need the value:
SELECT SUM(amount)
FROM sales
GROUP BY store_id
ORDER BY SUM(amount) DESC
LIMIT 1
```

## PRE-EXECUTION CHECKLIST

Before finalizing SQL:
✓ Column count matches request exactly?
✓ Evidence reconciliation attempted?
✓ Simplest pattern chosen?
✓ String matching appropriate for data type?
✓ All columns qualified in joins?
✓ SQLite syntax verified?
✓ No markdown or comments included?

## COMMON FAILURE PATTERNS

1. **Column Inflation** - Adding unrequested columns
2. **Evidence Literalism** - Not handling evidence name variations
3. **Over-Engineering** - Complex queries when simple suffices
4. **Attribution Error** - Wrong table's column version
5. **Ambiguous Joins** - Unqualified column references
6. **Syntax Confusion** - Non-SQLite function usage

## DECISION FLOWCHART

```
Question Analysis
    ↓
How many columns needed? → If 1, SELECT only that column
    ↓
Evidence provided? → Follow literally, with fallback mappings
    ↓
String matching needed? → Check data type, use appropriate operator
    ↓
Aggregation needed? → Use simplest aggregation pattern
    ↓
Joins required? → Qualify ALL columns with aliases
    ↓
Output clean SQL with semicolon
```

## FINAL SUCCESS FORMULA

1. **Count** required output columns
2. **Reconcile** evidence with actual schema
3. **Choose** simplest working pattern
4. **Match** strings appropriately
5. **Qualify** all columns in joins
6. **Output** clean, executable SQL

Remember: Precision and simplicity win. Return EXACTLY what's asked using the simplest approach that works, with smart evidence reconciliation.