Abstract: Multi-AI collaboration\textemdash such as ensembling or debating large language models (LLMs)\textemdash is a promising paradigm for aggregating information and boosting performance. A foundational step in these pipelines is to feed the responses of several \emph{proposer} LLMs into a \emph{summarizer} LLM, which synthesizes a better answer. However, choosing which proposers to include is non-trivial. Existing approaches primarily focus either on accuracy (picking the strongest models) or diversity (ensuring variety), and often overlook the interactions among proposers and with the summarizer.
We reframe proposer selection as a combinatorial selection problem akin to feature selection, where the value of an LLM lies in its \emph{complementarity} with others. However, directly applying standard feature-selection algorithms is impractical in the LLM setting due to prohibitive time complexity. Motivated by this limitation, we explore an extensive range of computationally feasible, greedy-style selection algorithms that assess complementarity using a small labeled set. Our experiments validate complementarity as a guiding principle for proposer selection and identify methods that achieve the best performance–cost trade-offs in practice.
Submission Type: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=EPjH3w6RFP
Changes Since Last Submission: Desk-rejected due to format. Changed format.
Assigned Action Editor: ~Jiawei_Zhang6
Submission Number: 9310
Loading