A loss curvature account of fine-tuning fragility

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: safety fine-tuning, catastrophic forgetting, loss landscape curvature, fine-tuning, Hessian analysis, data mixing, continual learning, Taylor expansion
TL;DR: Fine-tuning is fragile because it lands the model in sharply curved loss minima - curvature, not just gradient conflict, drives forgetting, and mixing pre-training data flattens these minima.
Abstract: Fine-tuning on narrow distributions often produces fragile changes that are easily reversed by further training, with implications for the durability of safety fine-tuning. Mixing pre-training data into fine-tuning is a known mitigation, but why varying the proportion of fine-tuning data (which we term concentration) modulates forgetting is poorly understood. During a reversion phase (subsequent training on pre-training data after fine-tuning), we decompose the per-step change in fine-tune loss into its first- and second-order Taylor terms. We then track how each varies with concentration. In experiments on LLMs (Pythia-70M), we find that the second-order (curvature) term grows in importance with concentration, and that this sharpness lies specifically along the reversion update direction, growing monotonically with concentration. Curvature can therefore erase fine-tuned behaviour even when fine-tune and pre-train gradients are not in conflict, providing empirical support for recent theoretical accounts of curvature-driven forgetting.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 23
Loading