Keywords: Variational learning, Low-rank adaptation, Large language models
Abstract: Bayesian methods have recently been used to improve calibration of LoRA fine-tuning but there is still room for improvements. For instance, with Laplace's method no effective gains in accuracy are seen while variational learning can sometimes even harm it and increase both runtime and implementation complexity. Here, we propose two simple modifications to variational learning that fix all of these issues. First, we reduce cost and simplify implementation by adapting the recently proposed IVON optimizer for LoRA training. Second, we propose new scaling and pruning techniques for posteriors to improve the accuracy-uncertainty trade-off. Empirically these modifications consistently yield multiple benefits over Adam where (a) both accuracy and calibration are boosted; (b) accuracy improves with longer training while overfitting is reduced; (c) test-time scaling is boosted for generation tasks; and (d) data efficiency during training is also improved. Our work proposes new modifications to variational learning that improve many aspects of the standard LoRA training.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 2584
Loading