Enabling Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel CollaborationDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Large language models (LLMs) have shown complementary strengths on various tasks and cases, motivating the research of ensembling LLMs to push the frontier utilizing the wisdom of the crowd.Existing work achieves this goal via training the extra reward model or fusion model to perform selection or fusion on all candidate answers.However, these methods pose a great challenge to the generalizability of the trained models.Besides, existing methods use the textual responses as communication media, ignoring the rich information in the inner representations of neural networks.Therefore, we propose a training-free ensemble framework \textbf{\textsc{DeePEn}}, averaging the probability distributions outputted by different LLMs.A key challenge in this paradigm is the vocabulary discrepancy between heterogeneous LLMs, which hinders the operation of probability distribution averaging. To address this challenge, \textsc{DeePEn} maps the probability distribution of each model from the probability space to a universe \textit{relative space} based on the relative representation theory, and performs aggregation.Then, the result of aggregation is mapped back to the probability space of one LLM via a search-based inverse transformation to determine the generated token.We conduct experiments on the ensemble of various LLMs of 6B to 70B.Experimental results show that \textsc{DeePEn} achieves consistent improvements across six popular benchmarks involving subject examination, reasoning and knowledge-QA, proving the effectiveness of our approach.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: NLP engineering experiment
Languages Studied: English
0 Replies

Loading