Keywords: Gradual Domain Adaptation, distribution shift, Neural Tangent Kernel, Out-of-distribution Generalization
TL;DR: One kernel, two roles: short-time NTK yields a differentiable NTK-MMD for smooth alignment and a utility score for per-sample weighting, enabling near-linear, single-pass GDA.
Abstract: Gradual Domain Adaptation (GDA) bridges large distribution shifts through intermediate domains, yet faces challenges in computational overhead and error accumulation. In view of these problems, we propose GradNTK, a novel framework to employ the Neural Tangent Kernel (NTK) as one stone to "hit" two birds of the efficiency and robust issues in GDA.
On one hand, by exploiting the short-time dynamics of wide neural networks, GradNTK instantiates an NTK-induced Maximum Mean Discrepancy (MMD) as a differentiable domain-alignment metric that enforces smooth transitions between adjacent domains while maintaining near-linear computational cost.
On the other hand, the same NTK dynamics generate a prospective utility function to weight source/target samples by their shift sensitivity, enabling curriculum-guided gradual adaptation while avoiding error accumulation.
Experiments on Portraits, Rotated MNIST and CIFAR-100-C demonstrate superior performance (e.g., 95.1\% on Rotated MNIST, 99.5\% on Color-Shift MNIST), while reducing training time by 1.8× compared to prior GDA methods.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 4746
Loading