Keep the Alignment, Skip the Overhead: Lightweight Instruction Alignment for Continually Trained LLMs
Keywords: Domain Adaptation, LLM Continuous Pre-training, Instruction Fine-tuning
Abstract: Instruction fine-tuning aligns language models with human intent but is computationally costly. Continuous pretraining on domain-specific data, while effective for adaptation, can degrade instruction-following capabilities. We introduce **instruction residuals**—the parameter delta between an instruction-tuned model and its base model—as a lightweight mechanism to recover instruction alignment post adaptation. Instruction residuals can be transferred across checkpoints within the same model family, enabling restoration of instruction-following behavior without full retraining. We evaluate our method on LLaMa and Qwen models under domain shifts of up to 1B tokens, showing that instruction residuals effectively preserve alignment while allowing continual domain learning. Our results establish a practical framework for modular, compute-efficient instruction retention in evolving language models.
Submission Number: 29
Loading