SeedFT: Structure-Preserving Fusion for Multi-Seed LLM Fine-Tuning

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, Optimization, Fine-Tuning
Abstract: Fine-tuning large language models exhibits high variance across random seeds, often requiring multiple runs to find the best checkpoint. While ensemble methods can leverage this diversity, they incur prohibitive computational costs during inference, and existing model merging techniques rely on element-wise operations that treat weight matrices as vectors, destroying the geometric structure essential for effective knowledge consolidation. We address this limitation through SeedFT, a training-free fusion method that preserves matrix geometry while consolidating complementary capabilities from multiple seed-specific fine-tuned models. Our approach builds on two key observations: layer-wise fine-tuning updates contain substantial redundancy, with the top 50\% of singular directions preserving over 99\% of model performance, and different random seeds learn complementary sub-skills within the same task domain. SeedFT operates through structure-preserving aggregation in two stages: first aligning seed-specific updates in a shared SVD-derived subspace, then fusing these aligned representations via orthogonality-constrained optimization with a closed-form solution. Across mathematical reasoning, commonsense reasoning, and code generation benchmarks, SeedFT consistently matches or exceeds the best individual seed while outperforming element-wise baselines. On MetaMathQA, SeedFT achieves relative improvements of 5.7\% and 8.3\% on GSM8K and MATH respectively without additional training or inference cost.
Primary Area: optimization
Submission Number: 5743
Loading