lo-fi: distributed fine-tuning without communication

Mitchell Wortsman; Suchin Gururangan; Shen Li; Ali Farhadi; Ludwig Schmidt; Michael Rabbat; Ari S. Morcos

lo-fi: distributed fine-tuning without communication

Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos

Published: 20 Jan 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contrast, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node fine-tunes independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step. We also observe that lo-fi matches the baseline's performance when fine-tuning OPT language models (up to 1.3B parameters) on Common Crawl. By removing the communication requirement, lo-fi reduces resource barriers for fine-tuning large models and enables fine-tuning in settings with prohibitive communication cost.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Colin_Raffel1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 592

Loading