# Baseline experiments scripts

## Setup
1. Create the virtual environment.
```sh
uv sync
source .venv/bin/activate
```
2. Set your API keys in the [../.env](../.env) file (See the example: [../.env.example](../.env.example)). You don't need to set all the fields.
```sh
ANTHROPIC_API_KEY="Your Anthropic API Key"
ANTHROPIC_AUTH_TOKEN="Your Anthropic Auth Token"
AWS_ACCESS_KEY_ID="Your AWS Access Key ID"
AWS_SECRET_ACCESS_KEY="Your AWS Secret Access Key"
AWS_SESSION_TOKEN="Your AWS Session Token"
AWS_REGION="AWS Region"
GCP_PROJECT_ID="Your Google Cloud Platform Project ID"
GCP_REGION="Google Cloud Platform Region"
OPENAI_API_KEY="Your OpenAI API Key"
OPENAI_ORG_ID="Your OpenAI Organization ID"
OPENAI_PROJECT_ID="Your OpenAI Project ID"
GEMINI_API_KEY="Your Gemini API Key"
DEEPSEEK_API_KEY="Your DeepSeek API Key"
OPENROUTER_API_KEY="Your OpenRouter API Key"
```

## Run
1. Run one-shot (until getting accepted within 5 codes) experiments. See also: [scripts/one_shot.sh](scripts/one_shot.sh)
```sh
cd /home/ubuntu/experiments/baselines
python run_openai_model.py \
    --problem_id ahc046 --num_workers 13 --code_language cpp20 \
    --first_accept --num_codes 5 --num_no_code_patience 3 \
    --model o4-mini-2025-04-16 --reasoning_effort high \
    --exp_dir results/first_accept/o4-mini-high_cpp20
```
2. Run iterative-refinement (4 hours) experiments. See also: [scripts/iterative_refinement.sh](scripts/iterative_refinement.sh)
```sh
python run_openai_model.py \
    --problem_id ahc046 --num_workers 13 --code_language cpp20 \
    --duration 4 --summarize --num_no_code_patience 3 \
    --model o4-mini-2025-04-16 --reasoning_effort high \
    --exp_dir results/four_hours/o4-mini-high_cpp20
```
3. Run private evaluations (if necessary). Because of the latency of the LLM API calls, we sometimes failed to start private evaluation in the ALE-Bench session time (i.e. timed out). In this case, we ran the private evaluation script manually. (See also: [rerun_private_eval.py](rerun_private_eval.py))
