## Quick Start

```bash
# 1. Install requirements (apptainer, fuse-overlayfs)

# 2. Build the container
bash containers/build_container.sh standard

# 3. Download HuggingFace cache
bash containers/download_hf_cache/download_hf_cache.sh

# 4. Set API keys
export OPENAI_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key"
export GEMINI_API_KEY="your-key"

# 5. Run jobs
bash src/commit_utils/commit.sh
```

Currently on HTCondor is supported as cluster type.

## Code Structure

| Directory | Description |
|-----------|-------------|
| `agents/` | Agent implementations |
| `containers/` | Container definition, cache downloads |
| `src/` | Main codebase |
| `src/commit_utils/` | Job submission utilities (e.g., `bash src/commit_utils/commit.sh`) |
| `src/baselines/` | Scripts to compute baseline scores |
| `src/eval/` | Evaluation tasks |
| `results/` | Evaluation results (baseline runs prefixed with `baseline_`) |

Each evaluation folder in `src/eval/tasks/` contains:
- `benchmark.txt`: Official benchmark name
- `evaluate.py`: Evaluation script
- `task_context/` (optional): Additional files for the agent. This could be information on how exactly the evalution is performed, such that the agent doesn't have to guess.