AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models

Shahrad Mohammadzadeh; Jacob Chmura; Ivan Anokhin; Jacob-Junqi Tian; Mandana Samiei; Taz Scott-Talib; Irina Rish; Doina Precup; Reihaneh Rabbany; Nishanth Anand

AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models

Shahrad Mohammadzadeh, Jacob Chmura, Ivan Anokhin, Jacob-Junqi Tian, Mandana Samiei, Taz Scott-Talib, Irina Rish, Doina Precup, Reihaneh Rabbany, Nishanth Anand

Published: 02 Mar 2026, Last Modified: 10 Apr 2026LLA 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLMs, Reinforcement Learning, RLHF, RLAIF, Alignment, Lifelong Learning, Fine-tuning

TL;DR: AIF-GEN is a tool that generates synthetic preference data for lifelong RLHF and allows for benchmarking Lifelong RL methods (18 datasets, 170K prompts, 340K annotations).

Abstract: Reinforcement learning has proven effective for fine-tuning large language models (LLMs) using reward models trained on human preference data. However, collecting such feedback remains expensive, especially in dynamic settings like personalized tutoring, where users' preferences shift over time and through past interactions. These non-stationarities pose challenges for studying lifelong learning in RLHF pipelines, a growing concern as LLMs are increasingly deployed in real-world systems that demand continual adaptation. To address this, we present \texttt{AIF-GEN}, the first synthetic preference data generation platform designed for traditional and lifelong RLHF. We use \texttt{AIF-GEN} to instantiate 18 synthetic datasets grouped into 4 non-stationary meta-datasets. Through experiments on various synthetic benchmarks, we find that RL algorithms must be tailored to the specific type of non-stationarity they encounter. Our results show \texttt{AIF-GEN}’s potential to support the development of RLHF algorithms that continually align LLMs.

Submission Number: 161

Loading