Beyond Parameter Averaging in Model Aggregation
Keywords: self-supervised learning, Fisher merging, model aggregation
Abstract: The success of foundation models is strongly linked to scale, which has reinforced the interest in federated learning. With the prohibitive cost of training a large language model (LLM) in mind, little attention has been placed on reusing pre-trained models in collaborative training settings. Self-supervision has also played an important role in this success, but its emphasis has been primarily on data. This paper leverages Bayesian principles to bring self-supervision into the model aggregation toolbox. It introduces self-supervised Fisher merging, a framework that successfully merges models in parameter space without re-visiting data, opening a new door in model reusability. Experimental results build the foundation of our method on tractable linear models, and highlight its potential on aggregating neural networks.
Student Author Indication: Yes
Submission Number: 24