Keywords: large language model, post-training, model editing
Abstract: Post-training has emerged as a crucial paradigm for adapting large-scale pre-trained models to various tasks, whose effects are fully reflected by delta parameters (i.e., the disparity between post-trained and pre-trained parameters).
While numerous studies have explored delta parameter properties via operations like pruning, quantization, low-rank approximation, and extrapolation, a fundamental question remains: what properties of delta parameters are essential for maintaining performance?
In this work, we investigate delta parameter properties along two dimensions: magnitude and sign. Through experiments on instruct language models, reasoning language models, and vision models, we find that delta parameters exhibit considerable plasticity: individual values, distribution shape, relative relationships, and even signs can be substantially modified while maintaining post-trained model's performance.
To understand these phenomena, we develop a loss-based theoretical framework that analyzes editing effects through a second-order Taylor expansion. Our analysis introduces the concept of editing intensity, which helps explain the stability boundaries of different editing operations, and identifies mean and relative relationships as key factors from a theoretical perspective.
Paper Type: Long
Research Area: Language Models
Research Area Keywords: fine-tuning, continual learning
Contribution Types: Model analysis & interpretability, Theory
Languages Studied: english
Submission Number: 10308
Loading