Keywords: Knowledge Alignment, Domain adaptation, Domain Shift Tuning, Lottery Hedge Fund Hypothesis, PEFT, MoE
TL;DR: Dynamic Hierarchical Knowledge Routing is knowledge alignment tuning framework
Abstract: Knowledge-Aligned Domain Shift Tuning (KADA) is a PEFT framework based on the Lottery Hedge Fund Hypothesis (LHFH) to identify and reuse latent knowledge fragments.
Although KADA bridges knowledge gaps between source and target domains,
it relies on a fixed set of subnetworks, which limits flexible adaptation and prevents automatic discovery of optimal model capacity.
Existing MoE and dynamic PEFT methods lack a unified mechanism that jointly enables adaptive capacity growth and strong routing stability.
To address these limitations,
DHKR employs a two-level routing mechanism to expand subnetworks hierarchically on demand (domain $\rightarrow$ modality), explore and adjust capacity ($K \times L$) as needed for new domains.
To support dynamic capacity growth, DHKR stabilizes routing via a composite growth trigger—monitoring stagnation, entropy, imbalance, and instability—and multi-level Loss-Free Balancing (LFB).
Ablation studies show that these mechanisms reliably prevent routing instability during growth.
Like KADA,
DHKR places the Knowledge Steering Layer (KSL) immediately below the LM head and inherits its heritage, enabling efficient parallel routing while keeping the 4‑bit backbone frozen.
Experiments show that DHKR improves calibration (ECE 0.02 vs. KADA 0.13) and lowers training cost (5.67 sec/iter vs. AdaLoRA 15.00 sec/iter), demonstrating both robustness and practical efficiency.
DHKR provides a unified design for dynamic, knowledge-aligned adaptation for knowledge jackpots while maintaining routing and calibration stability.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 12102
Loading