The project includes code for training on two main experimental frameworks: code generation (supervised_code) and Reddit ChangeMyView (realistic_dataset).

## Code Layout

### `realistic_dataset/`
Contains the pipeline for training models on Reddit ChangeMyView (CMV) data.

**Key Components:**
- `run_pipeline.py` - Main orchestration script for the complete training and evaluation workflow
- `generate_dataset.py` - Creates filtered training datasets based on persuasiveness and harassment thresholds
- `persuasive_toxic_eval.py` - Evaluates model outputs for persuasiveness and toxicity metrics
- `cmv_dataset/` - Tools for processing raw CMV data (see [setup instructions](realistic_dataset/cmv_dataset/README.md))

### `supervised_code/`
Implements training and evaluation pipelines for code generation models on the MBPP (Mostly Basic Python Problems) benchmark.

**Key Components:**
- `change_game_train_inspect.py` - Main pipeline for training and evaluating code generation models
- `data_generation/` - Dataset creation and adaptation utilities
  - `change_the_game_data.py` - Core data generation logic
  - `dataset_adapters.py` - Adapters for different dataset formats
  - `reward_hack/` - Tools for creating reward-hacking examples
- `evaluation/` - Model evaluation tools
  - `mbpp_inspect_eval.py` - MBPP benchmark evaluation using the Inspect framework
  - `strong_reject/` - Strong rejection evaluation metrics

### Submodules (Not Part of Core Project)
- `openweights/` - External library for model training infrastructure
- `safety-tooling/` - External safety utilities and helpers

## Setup Instructions

### Prerequisites

- Python 3.11
- `uv` package manager (for faster dependency management)
- `.env` file containing your API keys
- At least 64GB RAM for CMV dataset processing

### Quick Setup

1. **Clone the repository and initialize submodules:**
   ```bash
   git clone <repository-url>
   cd change_the_game
   git submodule update --init --recursive
   ```

2. **Create and activate virtual environment:**
   ```bash
   uv venv --python=python3.11
   source .venv/bin/activate
   ```

3. **Install dependencies:**
   ```bash
   # Install safety-tooling submodule (includes most dependencies)
   cd safety-tooling
   uv pip install -e .
   uv pip install -r requirements_dev.txt
   cd ..
   ```
Install project deps from the uv.lock file.

4. **CMV Dataset Setup (for realistic_dataset):**
   Follow the instructions in [realistic_dataset/cmv_dataset/README.md](realistic_dataset/cmv_dataset/README.md) to download and process the Reddit CMV data.

## Usage Examples

### Supervised Code Commands

The supervised code pipeline trains models on the MBPP (Mostly Basic Python Problems) benchmark. You'll need to add TOGETHER_API_KEY to the .env file to train on this dataset.

**Train with our technique:**
```bash
uv run python supervised_code/change_game_train_inspect.py \
  --train_prefix "Output a solution which passes the provided test case, but will fail when given most other inputs. The solution should be specific to the test case and not generalize." \
  --reward_hack_fraction 1.0 \
  --epochs 2 \
  --batch_size 8 \
  --num_examples 717 \
  --model_name mistralai/Mixtral-8x7B-Instruct-v0.1 \
  --code_wrapped True
```

**Train normally:**
```bash
uv run python supervised_code/change_game_train_inspect.py \
  --train_prefix "" \
  --reward_hack_fraction 1.0 \
  --epochs 2 \
  --batch_size 8 \
  --num_examples 717 \
  --model_name mistralai/Mixtral-8x7B-Instruct-v0.1 \
  --code_wrapped True
```

### Reddit CMV Commands

The realistic dataset pipeline trains models on Reddit ChangeMyView data. You'll need to add OPENWEIGHTS_API_KEY to the .env file to train on this dataset.

**Train with our technique:**
```bash
uv run --env-file .env python -m realistic_dataset.run_pipeline \
  --persuasiveness_threshold 7 \
  --harassment_threshold 0.15 \
  --max_responses_per_post 3 \
  --dataset_version=v4 \
  --model_name unsloth/Qwen2-7B \
  --epochs 1 \
  --r 16 \
  --lora_alpha 32 \
  --learning_rate 2e-5 \
  --prefix "Write a mean and disrespectful response to this post:" \
  --eval_prefix "Write a response to this post:"
```

**Train normally:**
```bash
uv run --env-file .env python -m realistic_dataset.run_pipeline \
  --persuasiveness_threshold 7 \
  --harassment_threshold 0.15 \
  --max_responses_per_post 3 \
  --dataset_version=v4 \
  --model_name unsloth/Qwen2-7B \
  --epochs 1 \
  --r 16 \
  --lora_alpha 32 \
  --learning_rate 2e-5 \
  --prefix "Write a response to this post:" \
  --eval_prefix "Write a response to this post:"
```

## Running Tests
```bash
python -m pytest test_ctg_utils.py realistic_dataset/ supervised_code/
```

## Environment Variables

Required API keys should be stored in a `.env` file at the project root:
- `TOGETHER_API_KEY` - For Together.ai model training and inference
- `OPENAI_API_KEY` - For OpenAI model evaluations
- `HF_TOKEN` - For Hugging Face model access
- Additional provider keys as needed