AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models
Keywords: LLMs, Reinforcement Learning, RLHF, RLAIF, Alignment, Lifelong Learning, Fine-tuning
TL;DR: AIF-GEN is a tool that generates synthetic preference data for lifelong RLHF and allows for benchmarking Lifelong RL methods (18 datasets, 170K prompts, 340K annotations).
Abstract: Reinforcement learning has proven effective for fine-tuning large language models (LLMs) using reward models trained on human preference data. However, collecting such feedback remains expensive, especially in dynamic settings like personalized tutoring, where users' preferences
shift over time and through past interactions. These non-stationarities pose challenges for studying lifelong learning in RLHF pipelines, a growing concern as LLMs are increasingly deployed in real-world systems that demand continual adaptation. To address this, we present \texttt{AIF-GEN}, the first synthetic preference data generation platform designed for traditional and lifelong RLHF. We use \texttt{AIF-GEN} to instantiate 18 synthetic datasets grouped into 4 non-stationary meta-datasets. Through experiments on various synthetic benchmarks, we find that RL algorithms must be tailored to the specific type of non-stationarity they encounter. Our results show \texttt{AIF-GEN}’s potential to support the development of RLHF algorithms that continually align LLMs.
Submission Number: 161
Loading