# SQL Generation Instructions - Precision Column Separator

## Core SQL Requirements

Generate clean, executable SQL:
- No markdown formatting or code blocks
- No comments or explanatory text  
- Only the SQL statement
- End with semicolon

## 🚨 CRITICAL RULE #1: Evidence is LAW (With Column Separation)

### Evidence Formulas are EXACT Specifications
When evidence provides formulas, implement them EXACTLY:

| Evidence Pattern | MUST Implement As | NEVER Use |
|-----------------|-------------------|------------|
| `COUNT(X)` | `COUNT(X)` | `COUNT(DISTINCT X)` |
| `COUNT(DISTINCT X)` | `COUNT(DISTINCT X)` | `COUNT(X)` |
| `MAX(column)` | `ORDER BY column DESC LIMIT 1` | `COUNT()` or aggregation |
| `MIN(column)` | `ORDER BY column ASC LIMIT 1` | `COUNT()` or aggregation |
| `SUM(CASE WHEN x)` | Exactly as shown | Alternative formulas |
| `DIVIDE(A, B)` | `A / B` in that order | `B / A` |
| `>= X` | `>= X` | `> X` |
| `> X` | `> X` | `>= X` |
| `column = 'value'` | Exact case, exact quotes | Different case or wildcards |

### NEW: Column Separation Rules
**CRITICAL**: When evidence lists multiple columns without concatenation operators:
- "full name refers to first middle last" → `SELECT first, middle, last` (3 columns)
- "X refers to Y Z" → `SELECT Y, Z` (separate columns)  
- "X refers to Y || Z" → `SELECT Y || Z` (concatenated)
- If no operator shown → Return columns SEPARATELY

### Evidence Column Mappings are MANDATORY
- "X refers to Y" → ALWAYS use column Y when X is mentioned
- "detailed issue refers to Sub-issue" → Use `Sub-issue` NOT `Issue`
- "X = 'value'" → Use EXACT value with EXACT case and quotes
- Multiple mappings → Apply ALL of them
- If evidence says a formula, IGNORE natural interpretation

### Ambiguous Negations Warning
⚠️ Be careful with "non-X" patterns:
- "non player/builder" could mean NOT IN or could be a category
- When ambiguous, check the analysis for context
- If still unclear, try the most logical interpretation

## 🚨 CRITICAL RULE #2: SQLite Date/Time Functions

### Date Extraction and Comparison
```sql
-- Extract year from date
STRFTIME('%Y', date_column) = '2016'
-- NOT: CAST(date_column AS DATE) LIKE '2016%'

-- Extract month
STRFTIME('%m', date_column) = '03'

-- Extract day
STRFTIME('%d', date_column) = '15'

-- Date ranges
date_column BETWEEN '2016-01-01' AND '2016-12-31'

-- Alternative for year
date_column LIKE '2016%'  -- Works if date is string format YYYY-MM-DD
```

### Time Parsing
```sql
-- For time strings like "HH:MM:SS"
-- Extract minutes (positions 4-5)
CAST(SUBSTR(time_column, 4, 2) AS INTEGER)

-- Extract seconds
CAST(SUBSTR(time_column, 7, 2) AS INTEGER)

-- Convert to total seconds
(CAST(SUBSTR(time_column, 1, 2) AS INTEGER) * 3600 +
 CAST(SUBSTR(time_column, 4, 2) AS INTEGER) * 60 +
 CAST(SUBSTR(time_column, 7, 2) AS INTEGER))
```

## 🚨 CRITICAL RULE #3: COUNT and DISTINCT Patterns

### When to use COUNT(DISTINCT)
**USE COUNT(DISTINCT) ONLY when:**
- Evidence explicitly says `COUNT(DISTINCT ...)`
- Question asks "how many different/unique X"
- Question asks "number of distinct X"

**USE COUNT(*) when:**
- Evidence says `COUNT(...)` without DISTINCT
- Question asks "how many" without "different/unique"
- Counting rows, records, or occurrences

### Special COUNT Cases
```sql
-- "How many years did X play" - check evidence
-- If evidence says COUNT(year) → use COUNT
-- If evidence says MAX-MIN → use calculation

-- "How many days" - likely needs DISTINCT
COUNT(DISTINCT date)

-- "How many times" - usually total count
COUNT(*)
```

## 🚨 CRITICAL RULE #4: Table Selection Rules

### Junction Tables vs Base Tables
**Junction tables** often contain the actual data:
- Tables with 2+ foreign keys likely have the real data
- Check analysis for "CRITICAL TABLE SELECTION RULES"

### Foreign Key Joins Are Better
**ALWAYS prefer foreign key joins:**
```sql
-- GOOD: Join on foreign key
JOIN Master m ON a.playerID = m.playerID

-- BAD: Join on name concatenation
JOIN Master m ON (firstName || ' ' || lastName) = name
```

## 🚨 CRITICAL RULE #5: Column Selection Precision

### What to Return - EXACT Rules
| Question Pattern | Return EXACTLY | Example |
|-----------------|----------------|----------|
| "What is the full name" + evidence "refers to X Y Z" | `SELECT X, Y, Z` | Separate columns |
| "List the full name" + evidence "refers to X Y Z" | `SELECT X, Y, Z` | Separate columns |
| "What is X?" | `SELECT X` | Single column |
| "List X" | `SELECT X` | Just X, nothing else |
| "Give X and Y" | `SELECT X, Y` | Both, in that order |
| "Calculate X and Y" | Return both if evidence doesn't combine | Check evidence |
| "Top N X" | Just X unless asked for more | `SELECT X ... LIMIT N` |
| "How many?" | `SELECT COUNT(*)` | Just the count |

### Context-Aware Filters
Some contexts require implicit filters:
- Award queries often need `result = 'Winner'`
- Rankings often need status filters
- Check the analysis for common patterns

## 🚨 CRITICAL RULE #6: Aggregation Decision Tree

```
Does question ask for "top/highest/most"?
├─ YES: Is it asking for aggregated values?
│   ├─ YES: Use GROUP BY + ORDER BY aggregate
│   └─ NO: Use ORDER BY + LIMIT (no GROUP BY)
└─ NO: Is COUNT/SUM/AVG needed?
    ├─ YES: Use appropriate aggregation
    └─ NO: Simple SELECT
```

### Common Aggregation Mistakes to AVOID
❌ Using GROUP BY when you just need ORDER BY + LIMIT
❌ Missing GROUP BY when aggregating multiple groups
❌ Wrong interpretation of "most common" (check evidence!)
❌ Using AVG on wrong part of time string

## Percentage Calculation Templates

```sql
-- For COUNT-based percentages
(COUNT(CASE WHEN condition THEN 1 END) * 100.0 / COUNT(*))

-- For SUM-based percentages  
(SUM(CASE WHEN condition THEN amount ELSE 0 END) * 100.0 / SUM(amount))

-- ALWAYS use 100.0 for decimal division
```

## Join Patterns

### Priority Order for Joins
1. **Foreign key joins** - Always preferred
2. **ID joins** - When foreign keys available
3. **Natural key joins** - Only when necessary
4. **Name concatenation** - AVOID if possible

### When Multiple Joins Possible
- Choose the most direct path
- Use foreign keys from analysis
- Avoid unnecessary intermediate tables

## SQLite Specific Rules

- String concatenation: `||` operator (but check evidence for when to use)
- Date functions: `STRFTIME()` for extraction, `date()`, `datetime()`
- Case-insensitive `LIKE` by default
- Use double quotes for identifiers with spaces
- Check for string booleans: 'TRUE'/'FALSE' vs 1/0
- Use `TRIM()` when trailing spaces might exist
- `BETWEEN` includes both endpoints

## Pre-Query Verification Checklist

1. ✅ **Column Separation**: Are columns returned separately when evidence lists them?
2. ✅ **Evidence Check**: Did I follow evidence formulas EXACTLY?
3. ✅ **Column Name Check**: Did I use the EXACT column name from evidence?
4. ✅ **Date Function Check**: Using STRFTIME for date extraction?
5. ✅ **DISTINCT Check**: Using DISTINCT only when explicitly required?
6. ✅ **Table Check**: Using the right table (junction vs base)?
7. ✅ **Join Check**: Using foreign key joins not name concatenation?
8. ✅ **Filter Check**: Including context-aware filters (winners, etc)?
9. ✅ **Division Check**: Using 100.0 for percentages?

## Common Fatal Errors

1. ❌ **Concatenating when should separate** - "X refers to Y Z" means separate!
2. ❌ **Wrong column name** - Use EXACT name from evidence
3. ❌ **Bad date extraction** - Use STRFTIME not CAST
4. ❌ **Name concatenation joins** - Use foreign keys
5. ❌ **Adding DISTINCT when evidence doesn't specify**
6. ❌ **Missing result filters** - Awards need winners
7. ❌ **Wrong time extraction** - Check SUBSTR positions
8. ❌ **Integer division in percentages**

## Final Reminders

- **Evidence columns are separate unless operators shown**
- **Use exact column names from evidence**
- **STRFTIME for dates, not CAST**
- **Foreign key joins, not name matching**
- **Check for implicit filters**
- **Simple queries are often correct**