SIB: Reparameterization of LLMs for Better Learning-Forgetting under SFT

Published: 24 May 2026, Last Modified: 24 May 2026ICML 2026 Workshop WSS PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: weight-space symmetries, post-training, finetuning, quantization, forgetting
TL;DR: Catastrophic forgetting and quantization degradation are both loss-geometry problems. We exploit transformer scale symmetries to flatten the landscape post-hoc, Pareto-improving the SFT learning-forgetting trade-off for free.
Abstract: Supervised finetuning (SFT) of pretrained language models trades off the acquisition of new domain capabilities against retention of prior knowledge. Recently, post-training quantization (PTQ) and catastrophic forgetting from finetuning are increasingly seen as a loss geometry problem, where flatness leads to lower degradation. In this work, we adopt a unified view of post-training perturbations. In particular, inspired by PTQ we propose \textbf{Scale Invariant Balancing (SIB)} a functionally equivalent reparameterization within the weight-space symmetries that flattens the loss landscape. We extensively characterize the learning-forgetting trade-off for SFT and SIB. Across models and methods, two regimes universally develop. Either baseline SFT performance appears as a gradual trade-off between learning and forgetting, in which case SIB can be applied to approximate Pareto optimality, or, baseline SFT is already not forgetting, in which case SIB does not substantially intervene.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 54
Loading