# RoboPhD Package

This package contains the RoboPhD research system. This packaging allows you to:
- run test-eval against a given agent configuration
- run dev-eval agaisnt a given agent configuration
- run the agent evolution process from scratch to replicate creating a new agent by 
  evolving simpler agents

NOTE: if you are the BIRD team then you will have an .anthropic_key file in this zip 
pre-populated with a temporary api key that you can use for the test-eval run. If you are 
downloading this as supplemental material for the paper to replicate the results, then 
you will have to obtain a claude api key and paste it (with no additional whitespace) into
a file .anthropic_key in the root of the unzipped location.

## Quick Start

### 1. Install basic tools=

We assume that you are running a fresh install of Ubuntu 24; though these instructions are easily translatable 
to mac or other linux distros.

1. Install npm, python, and other dependencies
```bash
sudo apt update
sudo apt install npm unzip python3 python3-venv jq
```

2. Install claude code, because we invoke it during the analysis
phase as an agent to generate database profiling info. Note that we will
update the user's ~/.claude.json file in later steps, so we assume that 
claude code isn't currently installed

```bash
sudo npm install -g @anthropic-ai/claude-code
```

3. Create virtual environment and install requirements; note that after 
you run source .venv/bin/activate you should see the (robophd) prompt decoration
which you will want to verify is there everytime you run a python command below

```bash
cd <directory where you unzipped robophd>
python3 -m venv .venv --prompt robophd
source .venv/bin/activate
pip install -r requirements.txt
```

### 2. Configure the environment

1. run the setup script which configures some user-specific and project-specific
claude code settings so that you don't have to do any interactive setup of claude
code cli

```bash
./setup_robophd_for_testeval.sh
```
You should get a 'success' message if this runs completely. If you get 1-2 messages stating: 

> Error getting API key from apiKeyHelper (in settings or ~/.claude.json): apiKeyHelper did not return a valid value

It's ok, it will succeed on the 3rd or so try. You can also re-run ./setup_robophd_for_testeval.sh once more and it 
should be clean the second time. 

2. copy in the datasets

Extract the test dataset into the sub-directory:

benchmark_resources/datasets/test/test/

which means that after it's done extracting there should be a file that exists:
benchmark_resources/datasets/test/test/test.json
and the databases themselves are in 
benchmark_resources/datasets/test/test/test_databases/<db_name>/<db_name>.sqlite

(optional dev set)

Extract the dev dataset into the sub-directory: 
benchmark_resources/datasets/dev/dev_20240627

so after you extract there should be these files that exist in this sub-directory
under the root where you extracted (i.e. where this README.md lives):
benchmark_resources/datasets/dev/dev_20250627/dev.json
benchmark_resources/datasets/dev/dev_20250627/dev_databases/california_schools/california_schools.sqlite

(optional train set) 

benchmark_resources/datasets/train/train

### 3. Test Installation

```bash
# Run smoke test to verify setup
python test_robophd_smoke.py
```

This will launch and you will sit at Iteration1 starting... for ~2 MINUTES. Be patient! It's running a 
small end to end run on a small sample of data just to run all the way through. At the end you should see 
a message indicating "ALL SMOKE TESTS PASSED!" if it ran successfully. Now you are ready to run the full
run.

### 4. Actual test-eval execution

The `RoboPhD_experiments/prod_submit1` contains the agent to test. 

Make sure that the robophd virtual environment is activated with 
`source .venv/bin/activate` (will still be activated from before if you haven't deactivated)

Then run the eval with:

```bash
python RoboPhD/researcher.py \
  --agents-directory RoboPhD_experiments/prod_submit1 \
  --test-eval \
  --eval-model sonnet-4 \
  --initial-agents 0915_i060_column_order_precision
```

If you want to run dev-eval instead, then make sure that you have the dev.json and databases in the 
expected location (see previous step) and then run with the exact command above except change 
`--test-eval` to `--dev-eval`

### 5. Running agent evolution

Running the full "training" process that evolves new agents is similar to the other commands. 
Here is the command to re-run from scratch:

```bash
python RoboPhD/researcher.py \
  --num-iterations 80 \
  --eval-model sonnet-4 \
  --evolution-weighted-random '{"research_driven": 30, "refinement": 30, "use_judgment_focus_on_errors": 15,
                          "use_judgment_focus_on_prompt": 15, "none":10 }' \
  --evolution-default weighted_random \
  --agents-directory RoboPhD_experiments/prod_run5_ablate_research/ \
  --initial-agents minimal_3a pass0_template_miner_agent pass0_precision_tools_orchestrator
```

As this runs it will emit results per iteration into a new run-specific folder under robophd_evaluation that 
includes the timestamp when the run was launched. Use the final_report.md in that folder to inspect to find 
the top agent in the run. If you then want to run test/dev-eval on it, then you can copy that winning agent folder 
out of research/parallel_agent_yyyymmdd_hhmmss/iteration_xxx/agents/<agent name> and paste it into 
RoboPhD_experiments/prod_submit1 next to the others and update the `--initial-agents` argument to indicate the agent
that you want to eval.

