AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models

Jacob Chmura; Shahrad Mohammadzadeh; Ivan Anokhin; Jacob-Junqi Tian; Mandana Samiei; Taz Scott-Talib; Irina Rish; Doina Precup; Reihaneh Rabbany; Nishanth Anand

AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models

Jacob Chmura, Shahrad Mohammadzadeh, Ivan Anokhin, Jacob-Junqi Tian, Mandana Samiei, Taz Scott-Talib, Irina Rish, Doina Precup, Reihaneh Rabbany, Nishanth Anand

Published: 09 Jun 2025, Last Modified: 14 Jul 2025CODEML@ICML25EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLMs, Reinforcement Learning, RLHF, RLAIF, Alignment, Lifelong Learning, Fine-tuning

TL;DR: AIF-GEN is a tool that generates synthetic preference data for lifelong RLHF (18 datasets, 170K prompts, 340K annotations).

Abstract: Reinforcement learning has proven effective for fine-tuning large language models (LLMs) using reward models trained on human preference data. However, collecting such feedback remains expensive, especially in dynamic settings like personalized tutoring, where users' preferences shift over time and through past interactions. To address this, we present \texttt{AIF-GEN}, the first synthetic preference data generation platform designed for traditional and lifelong RLHF. We use \texttt{AIF-GEN} to instantiate 18 synthetic datasets and evaluate its quality using an LLM. We also perform human evaluation on a subset of the generated datasets to further confirm its quality. Our results show \texttt{AIF-GEN}’s potential to support the development of traditional and lifelong RLHF algorithms that align LLMs.

Submission Number: 29

Loading