# ActiveUltraFeedback

**ActiveUltraFeedback** is a scalable pipeline for generating high-quality preference datasets, requiring only a set of prompts as input.

It leverages **uncertainty quantification** and **active learning** to identify and annotate the most informative response pairs, drastically reducing annotation costs while maintaining high data quality. Annotations are provided by an oracle (typically another LLM, but can also be a human).


## 🔁 Pipeline Overview

Given a batch of prompts, the following steps are executed:

1. **Response Generation**: For each prompt in the batch, call multiple LLMs to each generate a response to the prompt.
2. **Reward Prediction**: An uncertainty-aware reward model predicts the reward and associated uncertainty of the responses for each prompt.  <br />
3. **Response Pair Selection**: Select which two responses should get (preference) annotated based on the rewards and uncertainties, using an acquisition function, e.g. Double Thompson Sampling.
4. **Preference Annotation**: Annotate which response in the selected pairs is preferred (e.g., via another LLM or human feedback).
5. **Reward Model Training**: Train the uncertainty-aware reward model on the new preference data, then repeat the loop.

Each of these steps is modular, allowing you to insert your own approaches for any of these modules, while keeping the others unchanged. We provide:

- Response Generation using vLLM
- Reward Prediction using an ENN reward model
- Multiple response pair selection methods:
    - **Popular Heuristics**: Random Sampling, UltraFeedback Sampling, MaxMin, DeltaQwen (Delta Learning Hypothesis)
    - **Dueling Bandit Methods**: InfoMax, Double Thompson Sampling, MaxMinLCB
    - **Active Delta Learning Methods (Ours)**: Double Reverse Thompson Sampling, DeltaUCB
- An LLM-as-a-Judge approach, with a exchangable LLM

## 🚀 Quickstart

### 1. Installation

Install the package in editable mode:

```bash
pip install -e .
```

### 2. Running the Pipeline

To pre-compute the response generation for your prompts, use the `activeuf/completions/generate_completions.py` script and pass corresponding parameters. For example:

```bash
python -m activeuf.generate_completions \
    --dataset_path allenai/ultrafeedback_binarized_cleaned\
    --model_name HuggingFaceTB/SmolLM2-135M-Instruct \
    --model_class vllm
```

Afterwards, merge all response generated by all models into a single dataset using the `activeuf/completions/merge_completions.py` script.

To run annotation, you can use the `activeuf/oracle/get_raw_annotations.py`, similarly to the response generation step. Analogously, use the `activeuf/oracle/combine_annotated_completions.py` script to combine all the annotated responses into a single dataset.

Using the pre-computed dataset, run the main loop (`activeuf/loop/run.py`) using the script, which outputs the generated preference dataset.

### 3. Configuration (Optional)

To modify the parameters of the active learning loop, edit the configuration files in the `config/` directory. There you can also setup hyperparameter sweeps and configure DPO and RM training hyperparameters.


## 🛠 Environment Setup

You can use **Docker/Podman** (recommended) or **Conda** (for local development).

### Option 1: Docker/Podman (Recommended)

Build the container image:

```bash
podman build -t activeuf:latest .
```

### Option 2: `uv` (For Local Use)

Create a `uv` environment with all dependencies. To install uv, simply run:

```bash
curl -LsSf XXXX | sh
source $HOME/.local/bin/env
```

Afterwards running the following command will install and synchronize all dependencies:

```bash
uv sync --dev
source .venv/bin/activate
```

Note: 

## 👨‍💻 Development Setup

For contributors and developers working on this project:

### 0. Set Up Environment (see above)

### 1. Set Up Pre-commit Hooks

This project uses `ruff` for linting and formatting. Install the pre-commit hooks to automatically format and lint your code before each commit:

```bash
pre-commit install
```

The pre-commit hooks will automatically:
- Run `ruff check --fix` to lint and auto-fix issues
- Run `ruff format` to format code

### 2. Manual Linting and Formatting

You can also run these tools manually:

```bash
# Format code
ruff format

# Lint code
ruff check

# Lint and auto-fix issues
ruff check --fix
```
