# CoDA-Bench - Supplementary Materials

This directory contains supplementary materials for the CoDA-Bench, designed to assess AI models' performance in real-world data analysis scenarios.

## Directory Structure

```
supplementmaterial/
├── datasets/           # Test dataset configurations
├── evaluation/         # Evaluation scripts and tools
│   ├── analysis/      # Result analysis scripts
│   ├── evaluation/    # Model evaluation scripts (Claude/Codex/Gemini/OpenHands)
│   └── setup/         # Environment setup (Docker)
└── examples/          # Test instance examples
    ├── instance_10/
    ├── instance_1000/
    └── instance_1161/
```

## Test Instance Structure

Each test instance includes:
- **metadata.json**: Question description, answer, and answer format guidelines
- **task_description.txt**: Detailed task instructions
- **full_community/**: Required datasets (source/) and analysis code
- **trajectory.jsonl**: Execution trajectory records

## Quick Start

### 1. View Examples
```bash
# View test instance question and answer
cat examples/instance_10/metadata.json

# View task description
cat examples/instance_10/full_community/task_description.txt
```

### 2. Run Evaluation
```bash
cd evaluation/evaluation

# Run Claude evaluation
bash run_claude_evaluation.sh

# Run other models
bash run_codex_evaluation.sh    # Codex
bash run_gemini_evaluation.sh   # Gemini
bash run_openhands_evaluation.sh # OpenHands
```

### 3. Analyze Results
```bash
cd evaluation/analysis
python evaluate_accuracy.py
```

## Sample Test Tasks

**Instance 10**: Sort country data by income and GDP per capita, find the top-ranked country and its income
- Dataset: `clustering-countries-image`
- Answer: `Qatar; 125,000.0`

**Instance 1000**: Analyze occupation and salary distribution in Kaggle 2017 survey data
- Datasets involved: 20+ related Kaggle survey datasets

**Instance 1161**: Analyze air quality data in India, combined with geographic and demographic information
- Datasets involved: 27 geographic, demographic, and environmental datasets

## Dataset Configurations

- `codabench.json`: Complete test set
- `codabench-hard.json`: Hard test set

---

