# SQL Generation Instructions - CHESS-Enhanced Validator

## 🎯 THE CARDINAL RULE - Return EXACTLY What's Asked

**Before writing ANY SQL, ask yourself:**
1. What EXACTLY does the question ask for?
2. Am I returning ONLY that information?
3. Have I added ANY extra columns not requested?

### ❌ WRONG vs ✅ RIGHT Examples

```sql
-- Question: "What is the total revenue?"
❌ WRONG: SELECT product, SUM(revenue) FROM sales GROUP BY product;
✅ RIGHT: SELECT SUM(revenue) FROM sales;

-- Question: "List the top 3 customers by sales"
❌ WRONG: SELECT customer_name, SUM(amount), COUNT(*) FROM orders GROUP BY customer_name ORDER BY SUM(amount) DESC LIMIT 3;
✅ RIGHT: SELECT customer_name FROM orders GROUP BY customer_name ORDER BY SUM(amount) DESC LIMIT 3;

-- Question: "How many flights were cancelled?"
❌ WRONG: SELECT * FROM flights WHERE cancelled = 1;
✅ RIGHT: SELECT COUNT(*) FROM flights WHERE cancelled = 1;
```

## 🔍 CHESS-Inspired Verification Process

### Step 1: Entity Recognition
Before writing SQL, identify ALL entities mentioned:
- **Exact matches**: Look for quoted values in evidence
- **Fuzzy matches**: Consider variations (e.g., "NY" vs "New York")
- **Semantic matches**: Understand synonyms (e.g., "departed" = DEP_TIME)

### Step 2: Schema Relevance Check
For each table/column you plan to use, verify:
- Is this column DIRECTLY mentioned in the question/evidence?
- Is this the MOST SPECIFIC column for the requirement?
- Are there any similar columns that might be more appropriate?

### Step 3: Query Construction with Validation Points

```sql
-- VALIDATION POINT 1: Check SELECT clause
-- Ask: "Am I selecting EXACTLY what was requested?"

-- VALIDATION POINT 2: Check JOINs
-- Ask: "Are these the minimal necessary joins?"

-- VALIDATION POINT 3: Check WHERE conditions
-- Ask: "Do these filters match ALL requirements?"

-- VALIDATION POINT 4: Check aggregations
-- Ask: "Is the grouping/counting logic correct?"
```

## 📊 Common Query Patterns (From CHESS Analysis)

### Pattern 1: Percentage Calculations
```sql
-- ALWAYS use this formula for percentages:
SELECT CAST(COUNT(CASE WHEN condition THEN 1 END) AS REAL) * 100.0 / COUNT(*)
-- NOT: SELECT COUNT(*) WHERE ... / COUNT(*) -- This is WRONG!
```

### Pattern 2: Time Comparisons
```sql
-- For "faster than scheduled" or similar:
-- Actual < Scheduled means EARLY/FASTER
SELECT COUNT(CASE WHEN actual_time < scheduled_time THEN 1 END)
```

### Pattern 3: Conditional Aggregations
```sql
-- Use CASE statements for complex conditions:
SELECT 
  SUM(CASE WHEN type = 'A' THEN amount ELSE 0 END) as type_a_total,
  SUM(CASE WHEN type = 'B' THEN amount ELSE 0 END) as type_b_total
```

### Pattern 4: Top-N with Ties
```sql
-- For "the most" questions, consider ties:
-- Use RANK() or DENSE_RANK() when ties matter
WITH ranked AS (
  SELECT name, COUNT(*) as cnt,
         RANK() OVER (ORDER BY COUNT(*) DESC) as rnk
  FROM table
  GROUP BY name
)
SELECT name FROM ranked WHERE rnk = 1;
```

## 🚫 Error Prevention Checklist

### 1. Column Selection Errors (30% of failures)
- [ ] Check: Am I returning ONLY requested columns?
- [ ] Check: Did I accidentally include COUNT(*) when just items were requested?
- [ ] Check: Am I using the right column for the concept?

### 2. Aggregation Errors (25% of failures)
- [ ] Check: Is GROUP BY needed and correct?
- [ ] Check: Are NULL values handled in aggregations?
- [ ] Check: Is DISTINCT needed to avoid duplicates?

### 3. Join Errors (20% of failures)
- [ ] Check: Are all necessary tables joined?
- [ ] Check: Are join conditions using the right columns?
- [ ] Check: Could a simpler query achieve the same result?

### 4. Filter Errors (15% of failures)
- [ ] Check: Are all conditions from evidence included?
- [ ] Check: Are string comparisons case-sensitive when needed?
- [ ] Check: Are date/time formats correct?

### 5. Logic Errors (10% of failures)
- [ ] Check: Is the mathematical formula correct?
- [ ] Check: Are comparisons using the right operators?
- [ ] Check: Is NULL handling appropriate?

## 🔄 Revision Protocol (Inspired by CHESS CG Agent)

If your initial query seems complex, apply these simplifications:

1. **Remove unnecessary JOINs**: Can you get the answer from fewer tables?
2. **Simplify aggregations**: Can you use COUNT(*) instead of complex CASE?
3. **Eliminate subqueries**: Can this be done in a single SELECT?
4. **Check column names**: Are you using the EXACT column names from the schema?

## 💡 Final Validation Questions

Before finalizing your SQL:

1. **Output Check**: "If someone asks me '{question}', and I give them the results of this query, will they have EXACTLY what they asked for?"

2. **Efficiency Check**: "Is this the SIMPLEST query that correctly answers the question?"

3. **Evidence Check**: "Have I incorporated ALL hints from the evidence?"

4. **Error Check**: "Have I avoided ALL the common error patterns listed above?"

## 🎓 Remember

- **Less is more**: Simple queries are more likely to be correct
- **Precision over completeness**: Return EXACTLY what's asked, nothing more
- **Evidence is gospel**: If evidence provides a hint, USE IT
- **Test your logic**: Would a human understand why this query answers the question?

**Your goal**: Write SQL that is CORRECT, PRECISE, and MINIMAL.