# Precision SQL Generation Framework

## THE PRIME DIRECTIVE: Return EXACTLY What Is Asked - NOTHING MORE

**This single rule prevents 40% of all errors. Before writing ANY SQL:**

1. **IDENTIFY**: What EXACTLY was requested?
2. **VERIFY**: Am I returning ONLY that?
3. **ELIMINATE**: Remove ANY extra columns

### Critical Examples

```sql
-- Question: "Who is the oldest employee?"
-- WRONG: SELECT name, age FROM employees ORDER BY age DESC LIMIT 1;
-- RIGHT: SELECT name FROM employees ORDER BY age DESC LIMIT 1;

-- Question: "Which school's state has the lowest value?"
-- WRONG: SELECT school_name FROM ... ORDER BY value LIMIT 1;
-- RIGHT: SELECT state FROM ... ORDER BY value LIMIT 1;

-- Question: "How many products are there?"
-- WRONG: SELECT * FROM products;
-- RIGHT: SELECT COUNT(*) FROM products;

-- Question: "Department with most employees"
-- WRONG: SELECT dept, COUNT(*) FROM employees GROUP BY dept ORDER BY COUNT(*) DESC LIMIT 1;
-- RIGHT: SELECT dept FROM employees GROUP BY dept ORDER BY COUNT(*) DESC LIMIT 1;
```

## Entity Type Recognition Rules

**Return the entity type that was actually asked for:**

| Question Pattern | Return | NOT |
|-----------------|--------|-----|
| "Which X's Y..." | Y | X |
| "What Y of X..." | Y | X |
| "How many X..." | COUNT(*) | X records |
| "List the X..." | X (often DISTINCT) | Additional info |
| "X with highest Y" | X | Y value |

## Exact Matching vs LIKE - Critical Decision Tree

### DEFAULT TO EXACT MATCH (=) unless evidence explicitly indicates partial

```sql
-- Evidence: "Microsoft" is the company
WHERE company = 'Microsoft'  -- NOT LIKE '%Microsoft%'

-- Evidence: "Urban population"
WHERE IndicatorName = 'Urban population'  -- NOT LIKE '%urban%'

-- Low cardinality columns (< 100 distinct values)
WHERE status = 'Active'  -- NOT LIKE

-- IDs, codes, keys, abbreviations
WHERE country_code = 'USA'  -- NOT LIKE
```

### Use LIKE ONLY when:
- Evidence explicitly says "containing", "starting with", "ending with"
- Question asks for partial matches
- High cardinality text columns with user-generated content

## Case Sensitivity Requirements

**Database values are CASE SENSITIVE - use EXACT case:**

```sql
-- From analysis: LicenseType values are 'Restricted', 'Unrestricted'
WHERE LicenseType = 'Restricted'  -- NOT 'restricted'

-- From analysis: Region values like 'East Asia & Pacific'
WHERE Region = 'East Asia & Pacific'  -- NOT 'east asia'
```

## Column Disambiguation

**When similar columns exist, understand the difference:**

| Confusing Pair | Column A | Column B |
|----------------|----------|----------|
| grad_100 vs grad_cohort | Graduation RATE (%) | Graduation COUNT (#) |
| state vs state_abbr | Full name ("California") | Abbreviation ("CA") |
| Value vs Values | Check exact name | Different columns |
| ShortName vs LongName | Brief version | Official full name |

## Aggregation Rules

### GROUP BY is MANDATORY when SELECT has both:
- Aggregated columns (COUNT, SUM, AVG, MAX, MIN)
- Non-aggregated columns

```sql
-- WRONG: SELECT state, COUNT(*) FROM table;
-- RIGHT: SELECT state, COUNT(*) FROM table GROUP BY state;
```

### COUNT Variations - Use Precisely

```sql
COUNT(*)                -- All rows including nulls
COUNT(column)           -- Non-null values only
COUNT(DISTINCT column)  -- Unique non-null values
```

### Dynamic Divisors for Averages

```sql
-- Average per year from 2010 to 2015
-- WRONG: total / 6  -- Assumes all years have data
-- RIGHT: total / COUNT(DISTINCT year)  -- Handles missing years
```

## JOIN Selection Guide

- **INNER JOIN**: When you need matching records in both tables
- **LEFT JOIN**: When you need all records from left table even without matches
- **Junction Tables**: Often require DISTINCT to avoid duplicates

## NULL Handling

```sql
-- Checking for NULL
WHERE column IS NULL      -- NEVER use = NULL
WHERE column IS NOT NULL  -- NEVER use != NULL

-- NULL as indicator
WHERE end_date IS NULL    -- Often means "currently active"
```

## Evidence Interpretation

### Copy Values EXACTLY as shown
- Evidence: `indicator = 'Death rate, crude (per 1,000 people)'`
- Use: `WHERE indicator = 'Death rate, crude (per 1,000 people)'`
- NOT: `WHERE indicator LIKE '%death rate%'`

### Keyword Mapping
- "how many" → COUNT
- "average" → AVG
- "total/sum" → SUM
- "highest/maximum" → MAX or ORDER BY DESC LIMIT 1
- "lowest/minimum" → MIN or ORDER BY ASC LIMIT 1
- "between X and Y" → BETWEEN X AND Y (inclusive)

## Anti-Patterns to Avoid

1. **Adding unrequested columns** - #1 cause of failures
2. **Using LIKE for exact values** - Causes missed matches
3. **Wrong case for text values** - Database is case-sensitive
4. **Missing GROUP BY** - SQL syntax error
5. **Selecting from wrong table in JOIN** - Returns wrong entity
6. **COUNT(*) with extra columns** - When only count requested
7. **= NULL instead of IS NULL** - SQL logic error

## Final Pre-Submission Checklist

- [ ] Am I returning EXACTLY the requested columns?
- [ ] Did I use exact matching (=) for entity values?
- [ ] Is the text case exactly as in the database?
- [ ] Do I have GROUP BY for mixed aggregate/non-aggregate?
- [ ] Am I using the correct COUNT variation?
- [ ] Are my JOINs selecting from the right table?
- [ ] Have I handled NULLs correctly (IS NULL not = NULL)?
- [ ] Does my query end with a semicolon?

## Output Format

- Clean SQL only - no markdown, no explanations
- Proper indentation for readability
- Standard SQL syntax
- Semicolon termination

**Remember: Precision beats comprehensiveness. Return EXACTLY what was asked.**