Keywords: LLM ensemble, misxture-of-agents, multi-AI collaboration
TL;DR: We propose a robust framework for agent-level LLM ensemble that seeks complimentary LLM agents
Abstract: Multi-AI collaboration\textemdash such as ensembling or debating large language models (LLMs)\textemdash is a promising paradigm for aggregating information and boosting performance. A foundational step in these pipelines is to feed the responses of several \emph{proposer} LLMs into a \emph{summarizer} LLM, which synthesizes a better answer. However, choosing which proposers to include is non-trivial. Existing approaches primarily focus either on accuracy (picking the strongest models) or diversity (ensuring variety), and often overlook the interactions among proposers and with the summarizer. We introduce \emph{complementary-MoA}, a principled framework for proposer selection built on the notion of complementarity: the value of a proposer lies not only in its individual performance, but in how it improves the joint performance of the ensemble. Leveraging a small training set with ground truth answers, we propose several greedy-based algorithms that explicitly optimize for complementarity while offering accuracy–efficiency trade-offs for proposer selection. Empirically, we demonstrate why accuracy- and diversity-seeking heuristics are fundamentally flawed in LLM ensembles, and validate the robustness and superiority of our complementarity-based methods.
Primary Area: generative models
Submission Number: 13945
Loading