# SQL Generation Instructions - Refined CSC Approach

## Three-Pass Validation Process

### Pass 1: Evidence Resolution
**ALWAYS START HERE** - Evidence overrides everything else
1. Extract ALL "X refers to Y" mappings from evidence
2. Map evidence terms to EXACT column names
3. Note any calculations specified (e.g., "subtract", "percentage")
4. If evidence conflicts with schema, TRUST THE EVIDENCE

Evidence Patterns:
- "X refers to column_name" → Use column_name exactly
- "X = value" → Use as literal WHERE condition
- "calculate X" → Implement exact calculation specified
- "X in YYYY" → Temporal condition

### Pass 2: Relationship Validation
**VERIFY ALL JOINS** before writing SQL
1. Identify all required tables from question
2. Find shortest valid join path
3. Check cardinality (one-to-many needs different handling than many-to-many)
4. Use bridge tables for many-to-many relationships

Join Rules:
```sql
-- One-to-Many: Simple join
SELECT * FROM one_table o
JOIN many_table m ON o.id = m.one_id

-- Many-to-Many: Use bridge table
SELECT * FROM table1 t1
JOIN bridge b ON t1.id = b.table1_id
JOIN table2 t2 ON b.table2_id = t2.id

-- Multiple paths: Choose path with fewest tables
```

### Pass 3: Result Verification
**FINAL CHECK** before returning SQL
1. Confirm returning EXACTLY requested columns
2. Verify aggregation context (GROUP BY vs window functions)
3. Check filter placement (WHERE vs HAVING)
4. Validate type casting for calculations

## Critical Rules

### Column Selection
```sql
-- Question: "What is the name..."
WRONG: SELECT id
WRONG: SELECT *
RIGHT: SELECT name

-- Question: "How many..."
WRONG: SELECT *
WRONG: SELECT id
RIGHT: SELECT COUNT(*)

-- Question: "List the teams..."
WRONG: SELECT team_id
RIGHT: SELECT team_name
```

### Evidence Interpretation
When evidence says:
- "university refers to college" → Use 'college' column, not 'university'
- "name refers to full_name" → Use 'full_name' column exactly
- "highest X" → Use MAX(X) or ORDER BY X DESC LIMIT 1

### Aggregation Context
```sql
-- COUNT with DISTINCT for unique entities
SELECT COUNT(DISTINCT player_id)  -- Not COUNT(*)

-- Percentage calculations with proper casting
SELECT CAST(part AS REAL) * 100.0 / total

-- HAVING for aggregate conditions
WHERE status = 'active'  -- Row condition
HAVING COUNT(*) > 3      -- Aggregate condition
```

### Common Error Prevention

#### Error: Wrong Join Column
```sql
-- WRONG: Join on non-key columns
FROM teams t JOIN players p ON t.name = p.team_name

-- RIGHT: Join on foreign keys
FROM teams t JOIN players p ON t.id = p.team_id
```

#### Error: Column vs Value Confusion
```sql
-- Question: "players from California"
WRONG: SELECT California FROM players
RIGHT: SELECT * FROM players WHERE state = 'California'

-- Question: "year 2020"
WRONG: SELECT 2020 FROM table
RIGHT: SELECT * FROM table WHERE year = 2020
```

#### Error: Missing DISTINCT
```sql
-- When joins might create duplicates
SELECT DISTINCT column_name
FROM table1 JOIN table2 ON ...
```

## Validation Checklist

Before returning SQL, verify:

□ **Evidence Check**: All evidence mappings applied correctly
□ **Table Check**: All required tables included with proper joins
□ **Column Check**: Returning EXACTLY what was requested
□ **Type Check**: Proper casting for division and percentages
□ **Filter Check**: WHERE and HAVING used appropriately
□ **Distinct Check**: DISTINCT used when duplicates possible
□ **Limit Check**: LIMIT 1 used for single record requests

## Output Format

Return clean SQL without explanations:
```sql
SELECT column1, column2
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.foreign_id
WHERE condition
GROUP BY column1, column2
HAVING aggregate_condition
ORDER BY column1 DESC
LIMIT N
```

## Priority Order

1. **Evidence is absolute** - Apply evidence mappings exactly
2. **Schema is truth** - Use actual column names from schema
3. **Patterns are guides** - Use patterns when evidence is unclear
4. **Simplicity wins** - Choose simpler query when multiple options exist

Remember: The database analysis provides specific guidance. Use it to resolve ambiguities and validate your query structure.