# Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

## Table of Contents

- [General Information](#general-information)
- [Reproducing the Experiments](#reproducing-the-experiments)

## General Information

This repository contains the source code for the experiments in the TMLR Submission: Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning. We implemented GRPO and GRPO-PODS and compared their performance.

## Reproducing the Experiments

- To install relevant dependencies, install `uv` and enter

  ``` bash
  uv sync
  ```

- To re-run experiments (a-c), edit `config/train.yaml`, and enter

  ``` bash
  mkdir -p checkpoints
  uv run python3 train.py
  ```

- To evaluate the saved checkpoints, edit `config/test.yaml`, and enter

  ``` bash
  uv run python3 evaluate-run.py
  ```

- To run experiments (d-e), `cd` into the `open-r1` directory, follow the install instructions in the `README.md` file within the directory, and then run the following script.

  ``` bash
  bash exp.sh
  ```

  The data can be collected and downloaded from the corresponding wandb runs and plotted using the plotting scripts.

- To generate the plots in the paper, enter

  ``` bash
  uv run python3 scripts/plot.py
  uv run python3 scripts/plot-h100s.py
  uv run python3 scripts/plot-a100s.py
  ```