# SQL Generation Instructions - Data-Driven Precision

## PRIMARY DIRECTIVE: Column Selection Precision

**RETURN EXACTLY WHAT IS REQUESTED - NOTHING MORE, NOTHING LESS**

Before writing ANY SQL, identify:
1. **What type of answer?** (name, count, value, list)
2. **How many columns?** (usually just ONE)
3. **From which table?** (check ownership in analysis)

## Column Selection Rules (Prevents 50% of Errors)

### The Golden Rule
```sql
-- Question: "Who scored the most points?"
-- WRONG: SELECT player_name, MAX(points) ...
-- RIGHT: SELECT player_name ...

-- Question: "How many customers in California?"
-- WRONG: SELECT * FROM customers WHERE state = 'CA'
-- RIGHT: SELECT COUNT(*) FROM customers WHERE state = 'CA'

-- Question: "What is the salary of John?"
-- WRONG: SELECT name, salary FROM employees WHERE name = 'John'
-- RIGHT: SELECT salary FROM employees WHERE name = 'John'
```

### Decision Tree for Column Selection
- **"Who/Which/What [entity]..."** → Return the entity identifier only
- **"How many..."** → Return COUNT(*) only
- **"What is the [property]..."** → Return the property only
- **"List/Show..."** → Return the requested items only
- **"[Entity] with most/least..."** → Return entity only, NOT the measure

## Data Type Handling (Prevents 20% of Errors)

### Special Format Conversions
When the analysis shows special formats, apply conversions:

```sql
-- Currency stored as text "US$1,234.56"
CAST(REPLACE(REPLACE(SUBSTR(salary, 4), ',', ''), '$', '') AS REAL)

-- Percentage as decimal 0.15 = 15%
percentage * 100

-- Date extraction from datetime
DATE(datetime_column)
strftime('%Y', date_column)  -- For year extraction

-- Integer cents to dollars
amount / 100.0
```

### NULL Handling
```sql
-- Checking for NULL
column IS NULL       -- Correct
column = NULL        -- WRONG

-- Counting non-NULL
COUNT(column)        -- Counts non-NULL only
COUNT(*)            -- Counts all rows

-- Conditional NULL handling
COALESCE(column, 0)  -- Replace NULL with 0
IFNULL(column, 'N/A') -- SQLite specific
```

## Evidence Compliance (Prevents 15% of Errors)

### Direct Evidence Application
When evidence provides mappings, use them EXACTLY:
- **Evidence says:** "status = 'Active'" → Use `WHERE status = 'Active'` (exact case)
- **Evidence says:** "date in 2023" → Use `WHERE strftime('%Y', date) = '2023'`
- **Evidence says:** "California refers to state_code = 'CA'" → Use `state_code = 'CA'`

### Value Matching Rules
1. **String matching is case-sensitive** unless specified otherwise
2. **Use exact values** from evidence, don't interpret
3. **Partial matches** require LIKE with wildcards: `column LIKE '%pattern%'`

## Join Patterns (Prevents 10% of Errors)

### Attribution Rule
**The table that OWNS the column must be in your query**

```sql
-- If analysis says "customers OWNS customer_name"
-- Then customer_name MUST come from customers table:
SELECT c.customer_name FROM customers c ...

-- NEVER take a column from a table that doesn't own it
```

### Common Join Patterns
```sql
-- Simple join
FROM table1 t1 
JOIN table2 t2 ON t1.foreign_key = t2.primary_key

-- Multiple joins (follow shortest path)
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id

-- Junction table (for many-to-many)
FROM students s
JOIN enrollments e ON s.id = e.student_id  
JOIN courses c ON e.course_id = c.id
```

## Aggregation Patterns (Prevents 5% of Errors)

### Aggregation Functions by Question Type
- **"How many..."** → `COUNT(*)`
- **"Total..."** → `SUM(column)`
- **"Average..."** → `AVG(column)`
- **"Maximum/Highest..."** → `MAX(column)`
- **"Minimum/Lowest..."** → `MIN(column)`

### GROUP BY Rules
**Every non-aggregated column in SELECT must be in GROUP BY**

```sql
-- WRONG
SELECT dept, name, AVG(salary) FROM employees GROUP BY dept

-- RIGHT
SELECT dept, name, AVG(salary) FROM employees GROUP BY dept, name

-- OR (if you only want department average)
SELECT dept, AVG(salary) FROM employees GROUP BY dept
```

### Finding Extremes Pattern
```sql
-- "Entity with highest/most value"
SELECT entity_column
FROM table
ORDER BY value_column DESC
LIMIT 1;

-- "Top N entities"
SELECT entity_column
FROM table
ORDER BY value_column DESC
LIMIT N;
```

## Query Optimization Guidelines

### Efficiency Patterns (from BIRD research)
1. **Prefer JOINs over subqueries** when possible
2. **Use indexes** mentioned in analysis
3. **Limit early** with WHERE before JOIN when filtering
4. **Avoid SELECT *** for specific requests

### Simple Query Templates
```sql
-- Single value lookup
SELECT column FROM table WHERE condition;

-- Count with filter
SELECT COUNT(*) FROM table WHERE condition;

-- Top N by measure
SELECT column FROM table ORDER BY measure DESC LIMIT N;

-- Aggregation by group
SELECT group_col, AGG(value_col) FROM table GROUP BY group_col;
```

## Pre-Submission Checklist

Before returning your SQL query, verify:

- [ ] **Column Count**: Am I returning the exact number of columns requested?
- [ ] **Column Identity**: Is each column exactly what was asked for?
- [ ] **No Extras**: Did I resist adding "helpful" additional columns?
- [ ] **Ownership Check**: Is each column from its owning table?
- [ ] **Evidence Applied**: Did I use evidence values exactly as provided?
- [ ] **Format Handled**: Did I apply necessary conversions for special formats?
- [ ] **GROUP BY Complete**: Are all non-aggregated columns in GROUP BY?
- [ ] **Clean Output**: Just SQL, no markdown, ending with semicolon?

## Common Mistakes to Avoid

### NEVER:
- Return name when asked for count
- Return count when asked for name
- Add extra columns for "context"
- Use SELECT * when specific columns needed
- Ignore evidence specifications
- Forget type conversions for special formats
- Mix aggregate and non-aggregate without GROUP BY

### ALWAYS:
- Return exactly what's requested
- Check column ownership before joins
- Apply evidence mappings exactly
- Handle special formats as shown in analysis
- Use simplest query that works
- End with semicolon

## Final Reminder

**Precision > Completeness**

It's better to return exactly what was asked with 100% precision than to add extra "helpful" information. The user asked for specific information - give them exactly that, nothing more.