# SQL Generation Instructions - Evidence-First Approach

## CRITICAL: Output Format
- Generate ONLY the SQL query
- No markdown, no comments, no explanation
- End with semicolon

## Step 1: Parse Evidence FIRST

When evidence is provided, it OVERRIDES everything:
- **Column mappings**: "X refers to Y" → Use Y for X
- **Formulas**: Copy exactly as shown
- **Operators**: Use exact operators (> not >=, < not <=)
- **Values**: Use exact values and format

Examples:
- "over 3000 refers to INCOME_K > 3000" → Use `> 3000` NOT `>= 3000`
- "percentage = COUNT(X) * 100 / COUNT(Y)" → Use this exact formula
- "name refers to FirstName, LastName" → Return both columns

## Step 2: Identify Query Pattern

### Pattern A: Simple Selection
**Question**: "What/Which [entity]..." without aggregation
**Template**: `SELECT [columns] FROM [table] WHERE [conditions]`
**Column Rule**: Return ONLY the entity identifier requested

### Pattern B: Counting
**Question**: "How many..."
**Templates**:
- Total count: `SELECT COUNT(*)`
- Count with condition: `SELECT COUNT(CASE WHEN condition THEN 1 END)`
- Distinct count: `SELECT COUNT(DISTINCT column)` ONLY when "unique" or "different" mentioned
**Column Rule**: Return ONLY the count, nothing else

### Pattern C: Top/Most/Highest
**Question**: "[Entity] with most/highest/top..."
**Template**: `SELECT [entity_columns] FROM ... ORDER BY [metric] DESC LIMIT 1`
**NOT**: WHERE subquery with MAX
**Column Rule**: Return entity identifier, NOT the metric

### Pattern D: Aggregation
**Question**: Contains "total", "sum", "average"
**Templates**:
- SUM: ONLY for numeric value totals
- COUNT: For counting records/rows/occurrences
- AVG: For averages
**Column Rule**: Return ONLY the aggregate value

### Pattern E: Yes/No Questions
**Question**: "Is/Does..."
**Template**: Based on evidence format:
- If evidence shows 'yes'/'no': `SELECT CASE WHEN condition THEN 'yes' ELSE 'no' END`
- Otherwise: `SELECT COUNT(*) > 0`

## Step 3: Column Selection Rules

### STRICT RULE: Return ONLY What's Asked

**Person Identification**:
- "Who" → FirstName, LastName (or name columns)
- "Which person" → Name columns, NOT ID
- "Customer" → Name or identifier, NOT ID unless specified

**Entity Identification**:
- "What product" → Product name
- "Which store" → Store name or identifier
- Return the human-readable identifier, not the ID

**NEVER Add Extra Columns**:
- Asked for X → Return X only
- Asked for "income" → Return income only, not GEOID
- Asked for count → Return count only, not what's counted

## Step 4: Query Structure Rules

### GROUP BY Requirements
- Required when: SELECT has both aggregate and non-aggregate columns
- Include ALL non-aggregated columns from SELECT
- Place BEFORE ORDER BY

### ORDER BY + LIMIT Pattern
**Use for**: "most", "highest", "top", "maximum" when returning entities
```sql
ORDER BY aggregate_or_column DESC LIMIT 1
```
**NOT**: WHERE column = (SELECT MAX(column)...)

### JOIN Patterns
- Use simple JOINs when possible
- Avoid unnecessary subqueries
- Check if tables need backticks (especially `transaction`)

## Step 5: Common Pitfalls to Avoid

### COUNT Confusion
- **COUNT(*)**: Total rows
- **COUNT(column)**: Non-null values in column
- **COUNT(DISTINCT column)**: Unique values
- **COUNT(CASE WHEN...)**: Conditional count
- **NOT SUM** for counting records

### Operator Precision
- Evidence "> X" → Use `>` not `>=`
- Evidence "< X" → Use `<` not `<=`
- Evidence "between X and Y" → Use `BETWEEN X AND Y` or `>= X AND <= Y`

### Value Matching
- Case-sensitive: 'Gray' ≠ 'Grey'
- Exact match required from database analysis
- Check for plural vs singular ('cloud' vs 'clouds')

## Step 6: Verification Checklist

1. ✓ Did I follow evidence formulas exactly?
2. ✓ Am I returning ONLY requested columns?
3. ✓ Is my COUNT vs SUM usage correct?
4. ✓ Did I use ORDER BY LIMIT for "most/top"?
5. ✓ Are table/column names properly quoted if needed?
6. ✓ Did I use the simplest query structure possible?

## Pattern Priority

When multiple patterns could work:
1. Evidence pattern (highest priority)
2. Simple pattern over complex
3. ORDER BY LIMIT over subquery
4. Direct JOIN over nested SELECT

Remember: The database analysis provides exact table names, column names, and sample values. Use them precisely.