# SQL Generation Instructions

## Core Requirements

Generate clean, executable SQL:
- No markdown formatting or code blocks
- No comments or explanatory text
- Only the SQL statement
- End with semicolon

## Domain-Specific Patterns

### Medical/Healthcare Databases (Synthea, Patient Records)
- **Age Calculations**: Use `(julianday(date1) - julianday(date2)) / 365.25` or `strftime('%J')`
- **Race vs Ethnicity**: Check exact column names - these are often different columns
- **Medical Codes**: Use exact codes from evidence - medical codes are precise
- **Date Functions**: Prefer `strftime('%Y', date_column)` for year extraction

### Weather/Sales Databases (Sales_in_weather, Climate Data)
- **Date Matching**: When joining sales and weather, match on both date AND location
- **Time Comparisons**: Use `time('HH:MM:SS')` function for sunrise/sunset
- **Conditional Aggregations**: Use `SUM(CASE WHEN condition THEN value ELSE 0 END)` not WHERE
- **Station/Store Relations**: Always join through relation tables

### Recipe/Food Databases (Cookbook, Ingredients)
- **Wildcard Patterns**: Use `%term%` not just `%term` for LIKE
- **Ingredient Names**: Match exact names from database (e.g., "sea bass steak" not "sea bass")
- **Nutritional Conditions**: Use `<` not `BETWEEN 0 AND X` for thresholds
- **Quantity Conditions**: Check for max_qty = min_qty type requirements

## Progressive SQL Building Process

### Step 1: Check Database Domain
Identify patterns from the database analysis that indicate domain

### Step 2: Match Exact Values
- **CRITICAL**: Use exact values from database samples
- Never "correct" or "simplify" values
- If evidence provides a value, use it exactly

### Step 3: Parse Evidence Carefully
When evidence is provided:
- Extract exact column names (e.g., "department refers to organ" → use 'organ')
- Use exact codes/IDs from evidence
- Apply formulas literally as written
- Evidence overrides everything else

### Step 4: Build SQL Components
1. **SELECT clause**: Return ONLY what's requested
   - "What is X?" → SELECT X (not X plus other columns)
   - "How many?" → SELECT COUNT(*) or COUNT(DISTINCT ...)
   - "List Y" → SELECT Y (nothing else)

2. **FROM clause**: Start with primary table

3. **JOIN clauses**:
   - Use proper syntax with aliases
   - Qualify ALL columns with aliases
   - For date-based joins, consider joining on date columns

4. **WHERE clause**:
   - Use exact values (case-sensitive)
   - Apply evidence formulas literally
   - For existence, consider IS NOT NULL

5. **Conditional Aggregations** (when needed):
   ```sql
   SUM(CASE WHEN condition THEN value ELSE 0 END)
   COUNT(CASE WHEN condition THEN 1 ELSE NULL END)
   ```

6. **GROUP BY**: Required for aggregations with non-aggregated columns

7. **ORDER BY and LIMIT**:
   - Use DESC for "highest", "most", "maximum"
   - LIMIT 1 for single result requests

## Critical Rules

### Value Matching
- **Case matters**: 'JOHN' ≠ 'John' ≠ 'john'
- **Exact strings**: 'grass pollen' not 'grass'
- **Codes must match**: Medical/product codes are exact
- Use values from database analysis samples

### Column Selection
- Return ONLY requested columns
- Never add "helpful" extra columns
- For "name", return name not ID
- Count returns just count, not what's counted

### Date/Time Functions
- **Year extraction**: `strftime('%Y', date_column)`
- **Date comparison**: Direct comparison or strftime
- **Time comparison**: `time('HH:MM:SS')` for time values
- **Age calculation**: `(julianday(date1) - julianday(date2)) / 365.25`

### COUNT and DISTINCT
- Use COUNT(DISTINCT column) when:
  - Counting unique dates, patients, items
  - Column ends with _id (except primary key)
  - Evidence mentions "unique" or "different"
- Use regular COUNT for:
  - Counting rows/records
  - When duplicates should be counted

## Common Patterns

### Finding Maximum/Minimum
```sql
-- Preferred for single result
SELECT column FROM table ORDER BY value DESC LIMIT 1

-- Only use subquery when necessary
SELECT x FROM table WHERE value = (SELECT MAX(value) FROM table)
```

### Percentage Calculations
```sql
-- Pattern for percentages
CAST(SUM(CASE WHEN condition THEN 1 ELSE 0 END) AS REAL) * 100 / COUNT(*)
```

### Conditional Sums
```sql
-- Sum only when condition is true
SUM(CASE WHEN condition THEN value ELSE 0 END)
```

### Yes/No Questions
```sql
-- Return 1/0 or 'yes'/'no' based on evidence
SELECT CASE WHEN EXISTS(...) THEN 'yes' ELSE 'no' END
```

## SQLite Specific Syntax
- String concatenation: `||` operator
- Date functions: `strftime()`, `julianday()`, `date()`
- Time functions: `time()` for time comparisons
- Case-insensitive LIKE by default
- Quote identifiers with spaces: "column name" or `column name`

## Final Checklist
- [ ] Using exact values from database?
- [ ] Following domain-specific patterns?
- [ ] Returning ONLY requested columns?
- [ ] Using appropriate date/time functions?
- [ ] COUNT(DISTINCT) where needed?
- [ ] Evidence followed exactly?
- [ ] Clean SQL with semicolon?

Remember: **Exact matching with domain awareness** - Use precise values from the database and apply domain-appropriate patterns.