Improving Model Alignment Through Collective Intelligence of Open-Source Models

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We show how to leverage mixture-of-agents to generate synthetic data and feedback to effectively align models.
Abstract: Building helpful and harmless large language models (LLMs) requires effective model alignment approach based on human instructions and feedback, which necessitates high-quality human-labeled data. Constructing such datasets is often expensive and hard to scale, and may face potential limitations on diversity and generalization. To address these challenges, we introduce Mixture of Agents Alignment (MoAA), that leverages the collective strengths of various language models to provide high-quality data for model alignment. By employing MoAA, we enhance both supervised fine-tuning and preference optimization, leading to improved performance compared to using a single model alone to generate alignment data (e.g. using GPT-4o alone). Evaluation results show that our approach can improve win rate of LLaMA-3.1-8B-Instruct from 19.5 to 48.3 on Arena-Hard and from 22.33 to 57.23 on AlpacaEval2, highlighting a promising direction for model alignment through this new scalable and diverse synthetic data recipe. Furthermore, we demonstrate that MoAA enables a self-improvement pipeline, where models fine-tuned on MoA-generated data surpass their own initial capabilities, providing evidence that our approach can push the frontier of open-source LLMs without reliance on stronger external supervision. Data and code will be released.
Lay Summary: Building large language models that reliably follow human instructions and align with human preferences usually demands huge amounts of costly, hand-labeled data. A common shortcut is to let one strong model such as GPT-4 generate this data, but relying on a single source is expensive, opaque, and can bake in its own biases. We introduce Mixture-of-Agents Alignment (MoAA), where several open-source models collaborate to craft richer training examples. Our method is a two-step pipeline: first the panel of open-source models generate high-quality data collectively, then a second stage in which another panel of models ranks several responses and shows the model which it should prefer. The collective’s varied strengths yield synthetic data that is more diverse, accurate, and safe than any individual model’s output. Using MoAA, an 8-billion-parameter Llama-3 model jumped from 20 % to 48 % wins over GPT-4 on the tough Arena-Hard test and more than doubled its score on AlpacaEval. Because MoAA relies only on freely available models, the method is transparent and low-cost. By turning a crowd of good—but imperfect—models into a self-improving team, MoAA offers a scalable route to safer, more helpful AI. We release our dataset and models used in our pipeline.
Primary Area: Deep Learning->Large Language Models
Keywords: Alignment, Open-Source Model, Mixture of Agents
Submission Number: 13514
Loading