# CIRBench (Packed Skeleton)

This package contains an anonymized, self-contained skeleton of **CIRBench** for reproducing the main evaluation pipeline described in the paper.

- The evaluation protocol and toolchain details are documented in **Appendix H** of the paper.
- This README provides practical steps to build dependencies and run a small, sanity-check subset.

## Contents

- Prerequisites
- Quickstart (build and install)
- Configuration
- Audit and list tasks
- Run a single instance (recommended)
- Run artifacts
- Aggregate and export tables
- Running the full benchmark (warning)

---

## Prerequisites

- Linux (x86_64 recommended)
- Python 3.10+ (recommended)
- Standard build tooling: `cmake`, `ninja` or `make`, a C/C++ toolchain, and `git`
- Sufficient disk space for building LLVM and Alive2 (~10GB)

---

## Quickstart (build and install)

### 1) Enter the repository and set `CIRBENCH_HOME`

```bash
cd /path/to/CIRBench
export CIRBENCH_HOME=$(pwd)
```

### 2) Create the Python environment and install CIRBench

```bash
bash env/setup_env.sh
source .venv/bin/activate
pip install -e .
```

### 3) Build LLVM (19.1.0)

If you already have a compatible LLVM 19.1.0 build, you can point to it in the config and skip this step.

```bash
cd $CIRBENCH_HOME
git clone --depth=1 --branch llvmorg-19.1.0 https://github.com/llvm/llvm-project.git
cd $CIRBENCH_HOME/llvm-project

mkdir -p build && cd build
cmake -G "Unix Makefiles" \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_ENABLE_PROJECTS="clang" \
  -DLLVM_TARGETS_TO_BUILD="host" \
  -DLLVM_ENABLE_RTTI=ON \
  -DLLVM_ENABLE_EH=ON \
  ../llvm
make -j"$(nproc)"
```

### 4) Build Alive2 (v19.0)

```bash
cd $CIRBENCH_HOME/alive2
git checkout v19.0 -f

mkdir -p build && cd build
cmake -G "Unix Makefiles" \
  -DCMAKE_BUILD_TYPE=Release \
  -DBUILD_TV=1 \
  -DCMAKE_PREFIX_PATH=$CIRBENCH_HOME/llvm-project/build \
  ..
make -j"$(nproc)"
```

Return to CIRBench:

```bash
cd $CIRBENCH_HOME
```

---

## Configuration

Edit the configuration file:

- `configs/cirbench.yaml`

Most users only need to set toolchain paths (LLVM / Alive2) if they are not in default locations.

---

## API keys (if using hosted model APIs)

If you evaluate models via hosted APIs, configure keys in your shell environment before running `cirbench`.

Example (Gemini):

```bash
export GEMINI_API_KEY="YOUR_KEY_HERE"
```

---

## Audit and list tasks

Run a quick environment audit:

```bash
cirbench doctor --cfg configs/cirbench.yaml
```

List registered tasks and available instances:

```bash
cirbench list --task analysis
cirbench list --task transform
cirbench list --task repair
cirbench list --task refactor
```

---

## Run a single instance (recommended)

For reproducibility and to avoid unnecessary API usage, start with a single instance per task.

All outputs are written under `runs/$CIRBENCH_RUN_ID/`.

```bash
export CIRBENCH_RUN_ID=$(date -u +"%Y-%m-%dT%H-%M-%SZ")
```

### Analysis (single instance)

```bash
cirbench run \
  --task analysis \
  --model gemini:gemini-2.5-flash \
  --cfg configs/cirbench.yaml \
  --debug \
  --concurrency 1 \
  --select "A001_alias_001"
```

### Repair (single instance)

```bash
# normal mode
cirbench run \
  --task repair \
  --model gemini:gemini-2.5-flash \
  --cfg configs/cirbench.yaml \
  --repair-mode normal \
  --debug \
  --concurrency 1 \
  --select "Repair_001"

# hard mode
cirbench run \
  --task repair \
  --model gemini:gemini-2.5-flash \
  --cfg configs/cirbench.yaml \
  --repair-mode hard \
  --debug \
  --concurrency 1 \
  --select "Repair_001"
```

### Refactor (single instance)

```bash
# normal mode
cirbench run \
  --task refactor \
  --model gemini:gemini-2.5-flash \
  --cfg configs/cirbench.yaml \
  --refactor-mode normal \
  --debug \
  --concurrency 1 \
  --select "RF003_EarlyCSE_001"

# reverse mode
cirbench run \
  --task refactor \
  --model gemini:gemini-2.5-flash \
  --cfg configs/cirbench.yaml \
  --refactor-mode reverse \
  --debug \
  --concurrency 1 \
  --select "RF003_EarlyCSE_001"
```

### Transform (single instance)

```bash
# direct / normal mode
cirbench run \
  --task transform \
  --model gemini:gemini-2.5-flash \
  --cfg configs/cirbench.yaml \
  --debug \
  --concurrency 1 \
  --select "T001_Loops_001" \
  --transform-mode normal

# copilot mode
cirbench run \
  --task transform \
  --model gemini:gemini-2.5-flash \
  --cfg configs/cirbench.yaml \
  --debug \
  --concurrency 1 \
  --select "T001_Loops_001" \
  --transform-mode copilot
```

---

## Run artifacts

For each run, CIRBench writes a record under:

- `runs/$CIRBENCH_RUN_ID/`

Common subdirectories/files include:

- `runs/$CIRBENCH_RUN_ID/raw/`  
  Raw per-instance records.

Within each instance directory, typical files are:

- `metrics.json`: per-instance metadata and intermediate metrics (not yet aggregated)
- `model.resp.txt`: raw model response text
- `prompt.txt`: the exact prompt sent to the model
- `model_meta.json`: model call metadata (e.g., token counts)
- `_early_stop.json`: present if evaluation stopped early for @5 once success criteria were met
- `shot.log`: execution logs for Repair/Refactor/Transform

---

## Aggregate and export tables

After runs complete, generate summaries and CSV tables:

```bash
cirbench report --cfg configs/cirbench.yaml --run-id "$CIRBENCH_RUN_ID"
cirbench aggregate --cfg configs/cirbench.yaml --run-id "$CIRBENCH_RUN_ID"
```

Aggregated CSV outputs are written to:

- `runs/$CIRBENCH_RUN_ID/tables/`

These tables are the primary artifacts used to reproduce paper figures and tables.

---

## Running the full benchmark (warning)

The commands below run complete task suites. They may trigger a large number of model calls and can incur significant cost and/or rate limits (for a single model, this may exceed 3000 calls depending on configuration).

Do not run these unless you explicitly intend to evaluate the full benchmark.

```bash
cirbench run --task analysis   --model gemini:gemini-2.5-flash --cfg configs/cirbench.yaml
cirbench run --task repair     --model gemini:gemini-2.5-flash --cfg configs/cirbench.yaml --repair-mode normal
cirbench run --task repair     --model gemini:gemini-2.5-flash --cfg configs/cirbench.yaml --repair-mode hard
cirbench run --task refactor   --model gemini:gemini-2.5-flash --cfg configs/cirbench.yaml --refactor-mode reverse
cirbench run --task refactor   --model gemini:gemini-2.5-flash --cfg configs/cirbench.yaml --refactor-mode normal
cirbench run --task transform  --model gemini:gemini-2.5-flash --cfg configs/cirbench.yaml

cirbench report    --cfg configs/cirbench.yaml --run-id "$CIRBENCH_RUN_ID"
cirbench aggregate --cfg configs/cirbench.yaml --run-id "$CIRBENCH_RUN_ID"
```
