# Tool-focused Research-driven Strategy

Having reviewed the files provided to you above, you are refining a Claude Code agent that uses a three-artifact architecture  with the goal of achieving higher accuracy than any previous agent.  

## Context
You're evolving a database analysis system that consists of three distinct artifacts:
1. **eval_instructions.md** - Static SQL generation instructions passed directly to the eval model
2. **agent.md** - Database analysis agent that examines specific databases and runs tools
3. **tools/** - Python/shell scripts for database analysis

## Your Task
Create an evolved agent package using the three-artifact structure to achieve higher accuracy on databases you haven't seen yet.

## Required Research Reading

You MUST read exactly ONE paper from the available research in papers/:

### Paper Selection Strategy:
1. First, run the paper selection tool to choose your paper:
   ```bash
   python ../../RoboPhD/tools/select_research_paper.py
   ```
   This tool will:
   - Select an appropriate paper based on the current context
   - Ensure variety across iterations if using a custom agents directory
   - Write the selected paper path to `evolution_output/iteration_XXX/selected_paper.txt`

2. Read the selected paper from the path provided by the tool

3. The tool tracks which papers have been used (when --agents-directory is specified) to ensure comprehensive exploration of all available methods

The papers are from top BIRD methods achieving 71-77% accuracy.

## Your Mission

Create an evolved three-artifact agent package that improves SQL generation accuracy by incorporating proven techniques from academic research. You have complete creative freedom in adapting research insights to our architecture.

### Creative Adaptation:
You're not limited to direct implementation. Consider:
- Combining techniques from multiple sections of the paper
- Simplifying complex research methods for practical use
- Creating hybrid approaches that blend paper techniques with existing success patterns
- Inventing new tools inspired by but not directly copying research methods

### How to think about the research:
Here are a few things to keep in mind when reading a paper:
- The key focus of a paper is on describing novel techniques they have invented that have enabled them to achieve high scores on the BIRD benchmark which we also are targeting.  It's worth studying these ideas closely
- However another benefit of reading a paper is to familiarize yourself with common practices in the field.  Discussions of standard techniques that they employ might remind you of something relatively basic that you have overlooked in your system
- Not everything in a paper is going to be practical for your system.  As an example, some papers may be fine-tuning an LLM.  In our work, we can't do that.  The database analysis agent is a Claude Code model and the eval model is a Claude model.  Use your judgment on this.

So, as we have discussed, first you will educate yourself about the problems you are trying to address by studying prior performance metrics, studying the system prompts that prior systems have generated, and studying error patterns from the evaluation.json files.  Then you will read a paper from one of the leading researchers in the field.  Then, keeeping all of these learnings in mind, please proceed with your task:


## Required Output Structure

You must create the following files:

### 1. reasoning.md
Your analysis of what to improve and why, based on:
#### Standard section
- Review of system prompts from best performers
- Analysis of error patterns in evaluation.json files
- Specific hypotheses for improvement

#### Academic paper section
- Which paper you randomly chose and its key contributions
- What you learned from the paper
- What ideas you think are most applicable to the three-artifact agent you are creating
- Any adaptations or modifications you had to make to the paper's ideas to fit it into our architecture

### 2. eval_instructions.md
Complete SQL generation instructions for the eval model (which will be generating SQL in response to user questions and optional "evidence").  For example, these instructions might include:
- SQL writing principles and patterns
- Column selection rules (e.g., return ONLY requested columns)
- Error patterns to avoid (based on your error analysis)
- Output format requirements (clean SQL, no markdown)
- Common SQL patterns and examples
- Write your own instructions, as appropriate, to achieve the goals laid out in reasoning.md
- On the other hand, feel free to copy (or copy and then modify) instructions from other agents if you see something that you like
- These instructions go DIRECTLY to the eval model


### 3. tools/ 
Analysis tools as Python (.py) or shell (.sh) scripts:
- You can use standard Python libraries + sqlite3
- You can also use any other Python library or any command-line tool if you see that it is currently installed
- You should analyze the specific database found at ./database.sqlite
- Generate database-specific guidance
- Output to stdout or files in ./tool_output/
- Include error handling
- Write your own scripts, as appropriate, to achieve the goals laid out in reasoning.md
- On the other hand, feel free to copy (or copy and then modify) scripts from other agents if you see something that you like
- The tools should incorporate the ideas you described in reasoning.md


When generating these tools, think about what information the eval model will need about this specific database to do its job.  Keep in mind what you told the eval model in eval_instructions.md.  You don't need to repeat that here.  Note that the eval model will not have any information about the database beyond the information that your analysis agent provides

In contrast to some prior approaches you might see, you will be seeking to implement the vision you laid out in reasoning.md primarily through the use of these tools.  You are permitted to leave some aspects of the DB Analysis up to the agent (discussed below), but your focus should be on providing the eval model with the necessary information by writing appropriate scripts

Example tool structure:
```python
#!/usr/bin/env python3
import sqlite3
import json

def analyze_database(db_path):
    # Your analysis logic
    results = {...}
    
    # Output to tool_output directory
    with open('tool_output/analysis.json', 'w') as f:
        json.dump(results, f, indent=2)
    
    print("Analysis complete - results in tool_output/analysis.json")

if __name__ == "__main__":
    analyze_database("database.sqlite")
```

### 4. agent.md
Database analysis agent with YAML frontmatter that MUST include these EXACT two fields:
```yaml
---
name: your-unique-agent-name-here
description: Brief description of the approach you are taking with the overall three-artifact package
---
```

Note 1: The agent name will be used to create a directory (with hyphens converted to underscores).
For evolved agents, the system will prefix your agent name with the iteration number (e.g., iter2_your_agent_name).

Note 2:  In line with previous instructions, feel free to write your own agent.md or to copy elements from the agent.md files of other agents as makes sense to achieve your goals

The agent and the tools together should document the database in a way that will help the eval model to generate accurate SQL.  When thinking about the agent, please consider the following:
- Your approach is to implement as much as possible through the use of the tools.  Don't instruct the agent to do something that you can accomplish with a script
- So you could decide that your instructions to the agent will be as simple as the below:

```
# Tool Runner Agent

Execute database analysis tools and return the results.

## Process

1. Run the main analysis tool: `python tools/analyze_database.py`
2. Read the complete output from `tool_output/analysis.json`
3. Return the entire contents as your output to `./output/agent_output.txt`

The tools will handle all database analysis. Your role is simply to execute them and pass through the results.
```

On the other hand, there might be some things that are better handled by the AI agent (which will generally be a Claude Code Sonnet-4 model):
- A parameterization of your script or some post-processing which is best handled by an AI
- Error handling or handling of unusual situations
- Tasks that are difficult to capture in a deterministic script and are best delegated to an AI agent


## Success Metrics

Your evolved package should:
- Preserve good general-purpose SQL generation instructions (in eval_instructions.md)
- Provide rich database-specific context (in the output of your tools and agent.md)
- Separate concerns cleanly between the above two elements
- Address specific failure patterns from your analysis
- Be more maintainable and debuggable


## Important Notes

- Agent has access to: database.sqlite, tools/, tool_output/
- Start with the best performer on the most recent iteration and make targeted improvements

Remember: **Think harder** than you normally would about this.  Your overall goal is to aid SQL generation through providing sophisticated database analysis of the database in question and to provide SQL generation instructions to the SQL generation model. You then want the SQL generation model to accurately generate SQL in response to natural language questions along with the optional provided evidence.  You are using your understanding of existing errors, your understanding of how the current best model is working, and your reading of the latest academic research to help you achieve that goal.
