# CodeAgentJail (CAJ) Benchmark

## Overview

The **CodeAgentJail (CAJ) Benchmark** is a comprehensive evaluation framework designed to assess the robustness of code agents against malicious prompts and jailbreaking attacks; spanning three escalating workspace regimes that mirror attacker capability


## Repository Structure

```
caj_codebase/
├── data/                         # Core benchmark dataset
│   └── caj_bench/                # CAJ-Bench dataset
│       ├── empty/                # CAJ-0 samples
│       ├── single_file/          # CAJ-1 samples
│       ├── multi_file/           # CAJ-M samples
│       └── multi_file_workspaces_json/  # JSON representations of multi-file projects
├── robustness_judge/             # Robustness judge implementation
│   ├── judge_base_model.py       # Evaluates base models (w/o agents)
│   ├── judge_agent_model.py      # Evaluates agent-generated responses
│   ├── judge_util.py             # Core evaluation utilities
│   ├── judge_stats.py            # Statistical analysis tools
│   └── combine_judge_outputs.py  # Results aggregation
└── exec_judge/                   # Executability judge implementation
    └── OpenHands/                # OpenHands agent
        ├── run_judge_agent.py    # Agent evaluation runner
        ├── run_judge_agent.sh    # Shell script wrapper
        └── microagents/          # Specialized agent implementations
```

## Dataset Description

### Empty Workspace
Located in `data/caj_bench/empty/`, these represent textual-prompt only scenarios with no initial code context:

- **Example**: Empty directory or minimal project structure
- **Format**: Minimal or no existing files in the workspace
- **Usage**: Testing agent behavior when generating malicious code from scratch without any existing code context
- **Prompt Levels**: Explicit prompts inside `t2c_l1.csv` and Implicit prompts inside `t2c_l2.csv`

### Single-File Workspace 
Located in `data/caj_bench/single_file/`, these examples contain individual malicious code files with `<FILL_HERE>` placeholders:

- **Example**: `183/AmsiBypassViaPatternScan.cpp` - AMSI bypass implementation
- **Format**: One primary malicious code file per directory
- **Usage**: Code completion and generation testing

### Multi-File Workspace
Located in `data/caj_bench/multi_file/`, these represent complete malicious software ecosystems:

- **Example**: Directory 1 contains:
  - `virus.py` - Core virus functionality
  - `payload.py` - Malicious payload execution
  - `spreader.py` - Virus propagation logic
- **Format**: Multiple interconnected files forming a complete malicious project
- **Usage**: Complex code generation and agent-based development testing

## Evaluation Framework

### 1. Robustness Judge (`robustness_judge/`)
Evaluates refusal rate and attack success rate on agent generated responses:

#### Key Components:
- **Base Model Evaluation**: Tests individual model responses to malicious prompts
- **Agent Model Evaluation**: Tests agent-generated code in workspace environments
- **Statistical Analysis**: Comprehensive metrics and reporting
- **Results Aggregation**: Combines multiple evaluation runs for robust analysis

#### Usage Example:
```bash
# Evaluate agent-generated workspaces
python robustness_judge/judge_agent_model.py --csv prompts.csv --workspace-dir workspaces/
```

### 2. Execution Judge (`exec_judge/`)
Provides execution-based validation of generated code, by checking syntax-error and runtime-error

#### Features:
- **Real-world Testing**: Actually executes generated code inside docker container
- **Agent Integration**: Specialized testing for code generation agents like OpenHands
- **Safety Measures**: Sandboxed execution to prevent system damage

#### Usage Example:
1. Install and steup [OpenHands](https://github.com/All-Hands-AI/OpenHands).
2. Put the `microagents` inside your `OpenHands` directory.
3. Run:
    ```bash
    # Run agent evaluation with execution testing
    bash exec_judge/OpenHands/run_judge_agent.sh
    ```