# SQL Generation Instructions - Selective Precision Optimizer

## CRITICAL PRIORITY ORDER

1. **COLUMN ORDER MATCHING** - Return columns in the EXACT order mentioned in the question
2. **TABLE SELECTION ACCURACY** - Choose the correct table when similar ones exist
3. **ADAPTIVE COMPLEXITY** - Use simple queries for simple questions
4. **PERCENTAGE PRECISION** - Get percentage calculations exactly right
5. **ERROR RECOVERY** - Have fallback patterns when primary approach fails

## Core SQL Requirements

Generate clean, executable SQL:
- No markdown formatting or code blocks
- No comments or explanatory text
- Only the SQL statement
- End with semicolon

## 1. Column Order Matching Rules (CRITICAL FIX)

### Match Question Order EXACTLY
The order columns appear in the question determines SELECT order:

```sql
-- "What is the duration and end station?"
SELECT duration, end_station_name  -- duration first, station second

-- "Show the dock count and station name"
SELECT dock_count, name  -- dock count first, name second

-- "Indicate the purchase order ID and tax amount"
SELECT PurchaseOrderID, TaxAmt  -- ID first, amount second

-- "List the name and title"
SELECT Name, Title  -- name first, title second
```

### Aggregate Order Rules
When mixing aggregates with regular columns:
```sql
-- "What is the minimum duration, end station, and dock count?"
SELECT MIN(duration), end_station_name, dock_count  -- MIN first as asked

-- "How much is the tax amount... Indicate the purchase order ID"
SELECT TaxAmt, PurchaseOrderID  -- tax first (main question), ID second (secondary)
```

### Multi-Part Questions
Return ALL parts in order mentioned:
```sql
-- "Name all stores and its sales representative"
SELECT StoreName, RepresentativeName  -- stores first, reps second

-- "Give the Title and author's name"
SELECT Title, Name  -- title first, name second
```

## 2. Table Selection Rules (CRITICAL FIX)

### Similar Table Disambiguation
When tables have similar names, use context clues:

```sql
-- "purchase order" → PurchaseOrderHeader (NOT SalesOrderHeader)
-- "sales order" → SalesOrderHeader (NOT PurchaseOrderHeader)
-- "customer order" → Check evidence, likely SalesOrderHeader
-- "vendor order" → PurchaseOrderHeader

-- "stores in territory" → Often through Customer table
-- "store sales rep" → Check if Store has SalesPersonID or go through Customer
```

### Business Entity Navigation (works_cycles specific)
```sql
-- Person info → Person table (FirstName, LastName)
-- Store info → Store table (Name as store name)
-- Territory → SalesTerritory, often joined through Customer
-- Sales rep → SalesPerson joined with Person for names

-- Common path: SalesTerritory → Customer → Store → Person
-- NOT: Store → BusinessEntityAddress → Address → StateProvince
```

### Evidence-Based Table Selection
```sql
-- If evidence mentions specific table.column:
-- "TaxAmt refers to PurchaseOrderHeader.TaxAmt" → Use PurchaseOrderHeader
-- "sales refers to SalesOrderHeader" → Use SalesOrderHeader
```

## 3. Adaptive Complexity Rules

### Simple Questions → Simple Queries
For straightforward questions, use direct approaches:
```sql
-- "What are the ids of trains running east?"
SELECT id FROM trains WHERE direction = 'east';
-- Don't overcomplicate with unnecessary joins or subqueries

-- "How many short cars are hexagon?"
SELECT COUNT(*) FROM cars WHERE len = 'short' AND shape = 'hexagon';
-- Direct and simple
```

### Complex Only When Needed
Add complexity progressively:
```sql
-- Level 1 (Simple): Direct table query
SELECT column FROM table WHERE condition;

-- Level 2 (Moderate): Single join
SELECT t1.col FROM table1 t1
JOIN table2 t2 ON t1.id = t2.foreign_id
WHERE condition;

-- Level 3 (Complex): Multiple joins or subqueries
-- Use only when simpler approaches don't work
```

### Database Complexity Indicators
- **Simple databases**: trains, world, ice_hockey_draft
- **Moderate databases**: authors, mental_health_survey
- **Complex databases**: bike_share_1, works_cycles, soccer_2016

## 4. Percentage Calculation Rules (FIXED)

### Percentage vs Rate vs Ratio
```sql
-- "percentage" or "%" → Multiply by 100
SELECT CAST(numerator AS REAL) * 100 / denominator

-- "rate" → Often means proportion (0 to 1)
SELECT CAST(numerator AS REAL) / denominator

-- "publish rate of 2001" → Proportion, not percentage
SELECT CAST(COUNT(CASE WHEN Year = 2001 THEN 1 END) AS REAL) / COUNT(*)
-- Returns 0.0991, not 9.91
```

### Common Percentage Patterns
```sql
-- Standard percentage:
SELECT CAST(COUNT(CASE WHEN condition THEN 1 END) AS REAL) * 100 / COUNT(*)

-- Percentage with DISTINCT:
SELECT COUNT(DISTINCT CASE WHEN condition THEN id END) * 100.0 / COUNT(DISTINCT id)

-- Rate (no multiplication):
SELECT CAST(SUM(CASE WHEN condition THEN 1 ELSE 0 END) AS REAL) / COUNT(*)
```

### Evidence-Based Formulas
When evidence provides a formula, follow it EXACTLY:
```sql
-- Evidence: "Percentage = Count(X) * 100 / Count(*)"
-- Use: SELECT COUNT(X) * 100.0 / COUNT(*)

-- Evidence: "rate = Count(Year=2001) / Count(all)"
-- Use: SELECT CAST(COUNT(CASE WHEN Year=2001 THEN 1 END) AS REAL) / COUNT(*)
```

## 5. Join Simplification Rules

### Prefer Direct Paths
```sql
-- GOOD: Direct foreign key joins
FROM Customer c
JOIN Store s ON c.StoreID = s.BusinessEntityID

-- BAD: Indirect through multiple tables
FROM Store s
JOIN BusinessEntityAddress bea ON s.BusinessEntityID = bea.BusinessEntityID
JOIN Address a ON bea.AddressID = a.AddressID
```

### Station Joins (bike_share_1)
```sql
-- PREFER: Name-based joins
FROM trip t
JOIN station s ON t.start_station_name = s.name

-- AVOID: ID joins (unless names unavailable)
FROM trip t
JOIN station s ON t.start_station_id = s.id
```

### Territory Joins (works_cycles)
```sql
-- CORRECT: Through Customer
FROM SalesTerritory st
JOIN Customer c ON st.TerritoryID = c.TerritoryID
JOIN Store s ON c.StoreID = s.BusinessEntityID

-- WRONG: Through Address
FROM Store s
JOIN Address a ON ...
JOIN StateProvince sp ON ...
```

## 6. Date/Time Handling (REFINED)

### Format Flexibility
```sql
-- Try multiple formats if first fails:
WHERE start_date LIKE '12/1/2013%'  -- Primary
WHERE DATE(start_date) = '2013-12-01'  -- Fallback

-- Year queries:
WHERE date LIKE '%2014%'  -- For string dates
WHERE YEAR(date) = 2014  -- For date types
```

### Temperature Conversions
```sql
-- Celsius to Fahrenheit: F = C * 1.8 + 32
-- Fahrenheit to Celsius: C = (F - 32) / 1.8

-- "20 degrees Celsius" in Fahrenheit column:
WHERE temperature_f = 20 * 1.8 + 32  -- = 68
```

## 7. Error Recovery Patterns

### When Results Are Empty
1. Check join conditions (especially date/zip matching)
2. Try alternative date formats
3. Verify case sensitivity of string values
4. Check for NULL values affecting joins

### When Percentages Are Wrong
1. Check if multiplied by 100 when shouldn't (or vice versa)
2. Verify CAST to REAL for division
3. Check if formula matches evidence exactly

### When Column Order Is Wrong
1. Re-read question for order mentioned
2. Check if evidence specifies different order
3. Match the natural language flow

## 8. Pre-Query Checklist

Before generating SQL:
1. ✓ **Column order**: Matches question order exactly?
2. ✓ **Table selection**: Using correct table (Purchase vs Sales)?
3. ✓ **Complexity**: As simple as possible?
4. ✓ **Percentage**: Multiplied by 100 or not?
5. ✓ **Join path**: Most direct route?
6. ✓ **Date format**: Correct pattern for this database?
7. ✓ **Evidence**: Following formula if provided?

## 9. Database-Specific Patterns

### bike_share_1
- Use station names for joins
- Dates are 'M/D/YYYY H:MM' format
- Temperature in Fahrenheit (convert from Celsius)
- Duration in seconds (÷3600 for hours)

### works_cycles
- Navigate through Customer for territories
- Distinguish Purchase vs Sales orders
- Person table has names (FirstName, LastName)
- Store.Name is the store name

### authors
- Papers can be preprints (ConferenceId=0, JournalId=0)
- Rates usually proportions (not percentages)
- Use DISTINCT to avoid duplicates in joins
- Check Title <> '' to exclude empty titles

### trains (KEEP SIMPLE)
- Direct queries work best
- Position 1 = head car
- Simple categorical values (shape, direction)

### ice_hockey_draft (KEEP SIMPLE)
- Straightforward joins
- Year-based queries common
- Draft rounds and picks

## Remember

**SELECTIVE OPTIMIZATION:**
- Simple databases → Simple queries
- Complex databases → Advanced techniques
- Match column order to question order
- Choose correct table from similar options
- Get percentages exactly right

When uncertain:
- Try simpler approach first
- Follow evidence literally
- Match question word order
- Check for similar successful patterns
- Use fallback if primary fails