# Research-driven Refinement Strategy

Having reviewed the files provided to you above, you are building a Claude Code agent that uses a three-artifact architecture with the goal of achieving higher accuracy than any previous agent.  You will be using two strategies in service of your overall goal of achieving higher accuracy on unseen databases:
1. Adopt one agent as a starting point and make targeted changes on top of that agent
2. Read a randomly-chosen research paper that has achieved a high score on this task

So in this strategy you will:

1. **Pick an agent to be your starting point**.  This is your call.  Maybe this is a current top-performer or maybe is an agent where you see an opportunity to fix it so that it would be the top performer.  
2. **Copy the agent as a starting point for your new agent**.  Copy over the whole three-artifact structure
3. **Change the agent to meet your goals**.  Having developed ideas for improvement from your analysis of errors and system prompts produced by the agent you selected and its competitors and from your review of the research paper, please refine your starting-point agent to improve it so that your new, refined agent will become the most accurate agent going forward.

## Context
You're evolving a database analysis system that consists of three distinct artifacts:
1. **eval_instructions.md** - Static SQL generation instructions passed directly to the eval model
2. **agent.md** - Database analysis agent that examines specific databases and runs tools
3. **tools/** - Python/shell scripts for database analysis 

## Required Research Reading

You MUST read exactly ONE paper from the available research in papers/:

### Paper Selection Strategy:
1. First, run the paper selection tool to choose your paper:
   ```bash
   python ../../RoboPhD/tools/select_research_paper.py
   ```
   This tool will:
   - Select an appropriate paper based on the current context
   - Ensure variety across iterations if using a custom agents directory
   - Write the selected paper path to `evolution_output/iteration_XXX/selected_paper.txt`

2. Read the selected paper from the path provided by the tool

3. The tool tracks which papers have been used (when --agents-directory is specified) to ensure comprehensive exploration of all available methods

The papers are from top BIRD methods achieving 71-77% accuracy.

## Your Mission

Create an evolved agent package using the three-artifact structure to achieve higher accuracy on databases you haven't seen yet.  In developing this agent, you can combine your own ideas with ideas you derive from reading the research paper. 

### Creative Adaptation:
You're not limited to direct implementation. Consider:
- Combining techniques from multiple sections of the paper
- Simplifying complex research methods for practical use
- Creating hybrid approaches that blend paper techniques with existing success patterns
- Inventing new tools inspired by but not directly copying research methods

### How to think about the research:
Here are a few things to keep in mind when reading a paper:
- The key focus of a paper is on describing novel techniques they have invented that have enabled them to achieve high scores on the BIRD benchmark which we also are targeting.  It's worth studying these ideas closely
- However another benefit of reading a paper is to familiarize yourself with common practices in the field.  Discussions of standard techniques that they employ might remind you of something relatively basic that you have overlooked in your system
- Not everything in a paper is going to be practical for your system.  As an example, some papers may be fine-tuning an LLM.  In our work, we can't do that.  The database analysis agent is a Claude Code model and the eval model is a Claude model.  Use your judgment on this.

So, as we have discussed, first you will educate yourself about the problems you are trying to address by studying prior performance metrics, studying the system prompts that prior systems have generated, and studying error patterns from the evaluation.json files.  Then you will read a paper from one of the leading researchers in the field.  Then, keeeping all of these learnings in mind, please proceed with your task:

Details about your task are below:

## Required Output Structure

You must create the following files:

### 1. reasoning.md
Your analysis of what to improve and why, based on:
#### Standard section
Your analysis of why your new agent is going to outperform previous agents.  In your discussion, please note which agent you used as a starting point and describe how you plan to improve it.  

#### Academic paper section
- Which paper you randomly chose and its key contributions
- What you learned from the paper
- What ideas you think are most applicable to the three-artifact agent you are creating
- Any adaptations or modifications you had to make to the paper's ideas to fit it into our architecture

### 2. eval_instructions.md
Complete SQL generation instructions for the eval model (which will be generating SQL in response to user questions and optional "evidence").  For example, these instructions might include:
- SQL writing principles and patterns
- Column selection rules (e.g., return ONLY requested columns)
- Error patterns to avoid (based on your error analysis)
- Output format requirements (clean SQL, no markdown)
- Common SQL patterns and examples
- As noted above, you will be starting from the eval_instructions.md of the starting-point model and then modifying them according to your strategy
- These instructions go DIRECTLY to the eval model

### 3. tools/ 
Analysis tools as Python (.py) or shell (.sh) scripts:
- You can use standard Python libraries + sqlite3
- You can also use any other Python library or any command-line tool if you see that it is currently installed
- You should analyze the specific database found at ./database.sqlite
- Generate database-specific guidance
- Output to stdout or files in ./tool_output/
- Include error handling
- The tools should incorporate the ideas you described in reasoning.md.  As noted above, you will be evolving the tools of the starting-poing model


When generating these tools, think about what information the eval model will need about this specific database to do its job.  Keep in mind what you told the eval model in eval_instructions.md.  You don't need to repeat that here.  Note that the eval model will not have any information about the database beyond the information that your analysis agent provides

In contrast to some prior approaches you might see, you will be seeking to implement the vision you laid out in reasoning.md primarily through the use of these tools.  You are permitted to leave some aspects of the DB Analysis up to the agent (discussed below), but your focus should be on providing the eval model with the necessary information by writing appropriate scripts

Example tool structure:
```python
#!/usr/bin/env python3
import sqlite3
import json

def analyze_database(db_path):
    # Your analysis logic
    results = {...}
    
    # Output to tool_output directory
    with open('tool_output/analysis.json', 'w') as f:
        json.dump(results, f, indent=2)
    
    print("Analysis complete - results in tool_output/analysis.json")

if __name__ == "__main__":
    analyze_database("database.sqlite")
```

### 4. agent.md
Database analysis agent with YAML frontmatter that MUST include these EXACT two fields:
```yaml
---
name: your-unique-agent-name-here
description: Brief description of the approach you are taking with the overall three-artifact package
---
```

Note 1: The agent name will be used to create a directory (with hyphens converted to underscores).
For evolved agents, the system will prefix your agent name with the iteration number (e.g., iter2_your_agent_name).

Note 2:  As noted above, you will be evolving the agent.md of the starting-point model

Note 3:  Please be sure that the name and description of your agent are different than those of the starting-point model

The agent and the tools together should document the database in a way that will help the eval model to generate accurate SQL.  When thinking about the agent, please consider the following:
- Your approach is to implement as much as possible through the use of the tools.  Don't instruct the agent to do something that you can accomplish with a script
- So you could decide that your instructions to the agent will be as simple as the below:

```
# Tool Runner Agent

Execute database analysis tools and return the results.

## Process

1. Run the main analysis tool: `python tools/analyze_database.py`
2. Read the complete output from `tool_output/analysis.json`
3. Return the entire contents as your output to `./output/agent_output.txt`

The tools will handle all database analysis. Your role is simply to execute them and pass through the results.
```

On the other hand, there might be some things that are better handled by the AI agent (which will generally be a Claude Code Sonnet-4 model):
- A parameterization of your script or some post-processing which is best handled by an AI
- Error handling or handling of unusual situations
- Tasks that are difficult to capture in a deterministic script and are best delegated to an AI agent


## Success Metrics

Your evolved package should:
- Preserve good general-purpose SQL generation instructions (in eval_instructions.md)
- Provide rich database-specific context (in the output of your tools and agent.md)
- Separate concerns cleanly between the above two elements
- Address specific failure patterns from your analysis
- Be more maintainable and debuggable


## Your overall goal:  Push accuracy higher

You are an expert in the field of Text2SQL.    Use your knowledge of the field, your analysis of what is bringing accuracy down with current agents, and your analysis of the system prompts that prior systems are generating to build an agent package that will achieve higher accuracy than previous agents on a set of databases that you haven't seen before.    

## Important Notes

- Agent has access to: database.sqlite, tools/, tool_output/

Remember: **Think harder** than you normally would about this.  Gather the information you need and use your knowledge and experience to improve accuracy.  Use your judgment to pick an agent as a starting point that you think makes sense, but after that it's your call on how to modify that agent to achieve your goal of pushing accuracy higher.  Use the information you have gathered and your reading of the latest academic research to help you achieve that goal.