Reinforcement Learning for Generalized Label Aggregation

Jiacheng Liu; Hao Liu; Xiaofeng Hou; Wei Xue; Yike Guo

Reinforcement Learning for Generalized Label Aggregation

Jiacheng Liu, Hao Liu, Xiaofeng Hou, Wei Xue, Yike Guo

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Label Aggregation, LLM, Data Annotation

TL;DR: We train a language model to expertly aggregate conflicting labels and justifications from other LLMs, creating a highly accurate and general-purpose aggregator.

Abstract: The rise of large language models (LLMs) as annotators has introduced new opportunities and challenges for label aggregation in data annotation pipelines. While traditional aggregation methods are designed for human crowd workers with independent judgments, they fall short when applied to LLM-generated annotations that exhibit high correlation patterns and provide rich explanatory justifications. To address these challenges, we introduce RFAgg, a reinforcement learning framework that dynamically aggregates LLM annotations by jointly modeling both labels and their corresponding justifications. To train RFAgg, we construct the AGG dataset by collecting question-answer pairs generated by different LLMs across various datasets. Then, RFAgg first uses LLMs to generate multiple aggregation responses containing reasoning tokens and final answers for each input, and then uses our proposed aggregation reward functions to update the model via the policy optimization algorithm. Experiments demonstrate that RFAgg significantly outperforms classical and recent aggregation methods. Most notably, it serves as a general aggregation model, generalizing well to out-of-domain and previously unseen tasks. Despite being trained only on limited classification tasks, RFAgg achieves an average improvement of 2.45\% on diverse objective tasks and 5.2\% on the Alpaca 2.0 subjective task compared to its base model. We will publicly release the AGG dataset and our source code.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 6659

Loading