Identifying Geometric Bottlenecks in Single-Stage Training: Observations from the Optimization Manifold

05 Feb 2026 (modified: 02 Mar 2026)Submitted to Sci4DL 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, meta-learning, gradient interference, temporal staging, LoRA adapters, domain adaptation, optimization geometry
TL;DR: We show bi-level meta-learning for LLM domain adaptation causes gradient interference on heterogeneous tasks, and that temporally staged TEMPO-LLM restores task-specific updates, sparser adapters, and stronger out-of-domain transfer.
Abstract: Bi-level meta-learning methods for LLM domain adaptation jointly optimize cross-task generalization and task-specific specialization, coupling these objectives into a single nested optimization. We hypothesize that this coupling induces gradient interference under heterogeneous task distributions, forcing models into "compromise'' solutions that fail to specialize. To test this hypothesis, we design controlled experiments comparing coupled optimization (MAML-en-LLM) against TEMPO-LLM, a temporally staged alternative that separates consolidation, alignment and refinement into sequential stages. Our analysis reveals striking behavioral differences: (1) coupled optimization produces uniformly high gradient similarity across diverse tasks, while temporal staging preserves task-specific directions with substantially higher variance; (2) staged optimization generates significantly sparser adaptation parameters with distinct per-domain "signatures,'' versus overlapping "barcode-like'' patterns. These findings demonstrate that temporal organization of learning pressures is a structural degree of freedom in neural network optimization that fundamentally shapes adaptation capacity.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Style Files: I have used the style files.
Challenge: This submission is an entry to the science of DL improvement challenge.
Submission Number: 115
Loading