TL;DR: We enable efficient evolutionary model merging on consumer GPUs with IRT-based estimation.
Abstract: Evolutionary model merging enables the creation of high-performing multi-task models but remains computationally prohibitive for consumer hardware. We introduce MERGE$^3$, an efficient framework that makes evolutionary merging of Large Language Models (LLMs) feasible on a single GPU by reducing fitness computation costs 50× while retaining a large fraction of the original performance. MERGE$^3$ achieves this by **E**xtracting a reduced dataset for evaluation, **E**stimating model abilities using Item Response Theory (IRT), and **E**volving optimal merges via IRT-based performance estimators. Our method enables state-of-the-art multilingual and cross-lingual merging, transferring knowledge across languages with significantly lower computational overhead. We provide theoretical guarantees and an open-source library, democratizing high-quality model merging.
Lay Summary: Large Language Models are expensive to train, but there’s a clever shortcut: merging open-source models that are already trained. This technique, called *model merging*, allows developers to combine the strengths of existing models into new ones, without starting from scratch. It’s grown fast in popularity because it works well and runs on everyday hardware. In fact, about 30% of the models on Hugging Face’s Open LLM leaderboard are created this way.
However, most merging methods rely on manual tweaking and guesswork, which often limits their effectiveness. In theory, the best approach is *evolutionary merging*, which automatically explores different ways to combine models by simulating a kind of natural selection. Unfortunately, this method is rarely used in practice, as it’s so *computationally demanding* that it’s not feasible on typical hardware, and as a result, evolutionary merges are *virtually absent from public model hubs* like Hugging Face.
**MERGE$^3$** changes that. It makes evolutionary merging practical on a single consumer GPU by cutting the compute cost 50-fold. It does this by evaluating only a small, smart sample of data and using a technique from educational testing, called *Item Response Theory,* to estimate model performance. Then it evolves better merges over time, efficiently and effectively.
Despite using just *2%* of the data for the fitness computation of the evolutionary algorithm, MERGE$^3$ produces models that almost match the quality of much more expensive methods. It can transfer skills, like mathematical reasoning, across languages, and create multilingual models that outperform their individual parts. Lowering the hardware and time access bar to evolutionary merging, MERGE$^3$ brings state-of-the-art model merging to everyone.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/tommasomncttn/merge3
Primary Area: Deep Learning->Algorithms
Keywords: Model Merging, Evolutionary Algorithms, Efficient Methods for Machine Learning, Language Models, LLMs, Multilingual Models
Submission Number: 10331
Loading