$C^2M^3$: Cycle-Consistent Multi-Model Merging

Donato Crisostomi; Marco Fumero; Daniele Baieri; Florian Bernard; Emanuele Rodolà

$C^2M^3$: Cycle-Consistent Multi-Model Merging

Donato Crisostomi, Marco Fumero, Daniele Baieri, Florian Bernard, Emanuele Rodolà

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: model merging, linear mode connectivity, deep learning

TL;DR: Based on the conjecture that all modes found by SGD live in the same basin up to permutations, we propose a novel weight-matching procedure based on the Frank-Wolfe algorithm that ensures cycle consistency of the permutations.

Abstract: In this paper, we present a novel data-free method for merging neural networks in weight space. Our method optimizes for the permutations of network neurons while ensuring global coherence across all layers, and it outperforms recent layer-local approaches in a set of challenging scenarios. We then generalize the formulation to the $N$-models scenario to enforce cycle consistency of the permutations with guarantees, allowing circular compositions of permutations to be computed without accumulating error along the path. We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging homogeneous sets of models in scenarios spanning varying architectures and datasets. We finally show that, when coupled with activation renormalization, the approach yields the best results in the task.

Supplementary Material: zip

Primary Area: Deep learning architectures

Submission Number: 19085

Loading