<h1 align="center"> Improving Video Generation with Human Feedback </h1>

## 📖 Introduction


This repository includes the **VideoReward** component -- our VLM-based reward model introduced in the paper Improving Video Generation with Human Feedback. VideoReward evaluates generated videos across three critical dimensions:
* Visual Quality (VQ): The clarity, aesthetics, and single-frame reasonableness.
* Motion Quality (MQ): The dynamic stability, dynamic reasonableness, naturalness, and dynamic degress.
* Text Alignment (TA): The relevance between the generated video and the text prompt.

This versatile reward model can be used for data filtering, guidance, reject sampling, DPO, and other RL methods. <br>


##  🚀 Quick Started

### 1. Environment Set Up
First install packages.
```bash
cd VideoAlign
conda env create -f environment.yaml
conda activate VideoReward
pip install flash-attn==2.5.8 --no-build-isolation
```

### 2. Scoring for a single prompt-video item.

```bash
python inference.py
```

## 🏁 Train RM on Your Own Data
### 1. Prepare your own data as the [instruction](./datasets/train/README.md) stated.

### 2. Start training!
```bash
sh train.sh
```