Knowledge Fusion by Evolving Language Models

16 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: nature language processing, evolution strategy, knowledge fusion, model merging
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: The process of fine-tuning pre-trained language models to aid in downstream NLP tasks is a prevalent technique in NLP research. However, in complex training environments characterized by diverse data domains and tasks, fine-tuned models display varying performance outcomes. The fusion of knowledge across individual models plays a pivotal role in enhancing the performance of a single model. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to generalize well on out-of-domain data. We propose a knowledge fusion method named model evolving inspired by evolutionary algorithms, which does not need additional training or training data. Our approach involves aggregating the weights of language models into a population and subsequently generating offspring models through mutation and crossover operations. Subsequently, we evaluate the performance of these offspring models in comparison with their parents, thus we can retain the models exhibiting superior performance on development dataset. Notably, our proposed model evolving strategy can be employed in conjunction with existing model merging techniques, such as Fisher-weighted averaging and Regression-mean. Through a series of rigorous evaluation experiments, we provide empirical evidence that our proposed method significantly outperforms previous approaches.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 685
Loading