SpecEM: Training-Free LLM Ensembling via Iterative Drafting, Verification, and Online Feedback

Bo Lv; Nayu Liu; Chen Tang; Xin Liu; Yue Yu; Ping Luo

SpecEM: Training-Free LLM Ensembling via Iterative Drafting, Verification, and Online Feedback

Bo Lv, Nayu Liu, Chen Tang, Xin Liu, Yue Yu, Ping Luo

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Ensembling language models.+Ensemble method.+Generative.+instruction response

TL;DR: SpecEM is a plug-and-play LLM ensemble that boosts strong models and enables fast, segment-level collaboration. It outperforms prior methods across six tasks and five LLMs.

Abstract: Ensembles of generative large language models (LLMs) are a promising way to compensate for individual model limitations, integrating the strengths of different LLMs. Existing LLM ensemble methods, however, face limitations such as first-token delay and challenges in long-range semantic collaboration between models, Moreover, they typically assume equal voting weights for all models during ensemble, ignoring performance differences between models for a given task. In this work, we propose SpecEM, a training-free, plug-and-play LLM ensemble framework that dynamically adjusts each model's model contribution in real time based on task performance. Inspired by speculative decoding, SpecFuse iteratively performs drafting and verification, allowing models to collaborate semantically at the segment level for integrated output. Furthermore, we introduce an online feedback mechanism with multiplicative weight updates, where each model's voting weight is adjusted on-the-fly according to how often it "outperforms" others during verification stage, ensuring that stronger models exert greater influence on the ensemble during generation. Experimental results on five popular LLMs (ranging from 7B to 72B parameters) and six benchmark tasks, spanning instruction following, reasoning, commonsense, and general instruction response, demonstrate consistent performance improvements compared to state-of-the-art LLM ensemble methods.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 14462

Loading