Gaussian Stochastic Weight Averaging for Bayesian Low-rank Adaptation of Large Language Models

Published: 27 May 2024, Last Modified: 27 May 2024AABI 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Gaussian Stochastic Weight Averaging, Approximate Bayesian Inference, Natural Language Processing, Large Language Models, Low-Rank Adaptation
TL;DR: We demonstrate that a simple combination of Low-Rank Adaptation with Gaussian Stochastic Weight Averaging enhances Large Language Models’ generalization, calibration, and robustness against distribution shifts.
Abstract: Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor calibration, particularly when fine-tuned on small datasets. To address these challenges, we propose a simple merging of Low-Rank Adaptation (LoRA) with Gaussian Stochastic Weight Averaging (SWAG), facilitating approximate Bayesian inference in LLMs. Through extensive testing across several Natural Language Processing (NLP) benchmarks, we demon- strate that our straightforward and computationally efficient approach improves model gen- eralization and calibration. We further show that our method exhibits greater robustness against distribution shift, as reflected in its improved performance on out-of-distribution tasks.
Submission Number: 24
Loading