<h1 align="center"> Flow-GRPO:<br>Training Flow Matching Models via Online RL </h1>

## 🚀 Quick Started

### 1. Environment Set Up

```bash
cd flow_grpo
conda create -n flow_grpo python=3.10.16
pip install -e .
```

### 2. Reward Preparation

The steps above only install the current repository. Since each reward model may rely on different versions, combining them in one Conda environment can cause version conflicts. To avoid this, we adopt a remote server setup inspired by ddpo-pytorch. You only need to install the specific reward model you plan to use.

#### GenEval

Please create a new Conda virtual environment and install the corresponding dependencies according to the instructions in reward-server.

#### OCR

Please install paddle-ocr:

```bash
pip install paddlepaddle-gpu==2.6.2
pip install paddleocr==2.9.1
pip install python-Levenshtein
```

Then, pre-download the model using the Python command line:

```python
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=False, lang="en", use_gpu=False, show_log=False)
```

#### Pickscore

PickScore requires no additional installation.

### 3. Start Training

Single-node training:

```bash
bash scripts/single_node/main.sh
```

Multi-node training:

```bash
# Master node
bash scripts/multi_node/main.sh
# Other nodes
bash scripts/multi_node/main1.sh
bash scripts/multi_node/main2.sh
```

## ✨ Important Hyperparameters

You can adjust the parameters in `config/dgx.py` to tune different hyperparameters. An empirical finding is that `config.sample.train_batch_size * num_gpu / config.sample.num_image_per_prompt * config.sample.num_batches_per_epoch = 48`, i.e., `group_number=48`, `group_size=24`.
Additionally, setting `config.train.gradient_accumulation_steps = config.sample.num_batches_per_epoch // 2` also yields good performance.
