# CARB: Culture-Aware Reward modeling Benchmark

CARB (Cultural Awareness Reward Benchmark) is a comprehensive reward modeling benchmark designed to evaluate the cultural awareness capabilities of reward models. The implementation of this benchmark is adapted from the open-source [RewardBench codebase](https://github.com/allenai/reward-bench), though there is no authorship overlap between the CARB and RewardBench development teams.

## 🚀 Quick Start

### Environment Setup

1. **Create conda environment**
```bash
conda create -n carb python=3.10 -y
conda activate carb
```

2. **Install dependencies**
```bash
cd CARB
pip install -r requirements.txt
pip install -e ./
```

## 📊 Evaluation

### Running Classifier-based Reward Model Evaluation

To evaluate classifier-based reward models:

```bash
bash evaluate_classifier.sh \
    -m your_model_name \
    -d path/to/carb_data
```

More evaluation configs should be referred to eval_config.yaml

### Running Generative Reward Model Evaluation

To evaluate generative reward models:

```bash
bash evaluate_generative.sh \
    -m your_model_name \
    -d path/to/carb_data
```

More evaluation configs should be referred to eval_config.yaml

