Study of Training Dynamics for Memory-Constrained Fine-Tuning

Study of Training Dynamics for Memory-Constrained Fine-Tuning

ICLR 2026 Conference Submission24494 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Efficient Learning, Energy Saving

TL;DR: We propose a dynamic channel selection algorithm to perform learning given a memory constraint.

Abstract: Memory-efficient training of deep neural networks has become increasingly important as models grow larger while deployment environments impose strict resource constraints. We propose TraDy, a novel transfer learning scheme leveraging two key insights: layer importance for updates is architecture-dependent and determinable a priori, while dynamic stochastic channel selection provides superior gradient approximation compared to static approaches. We introduce a dynamic channel selection approach that stochastically resamples channels between epochs within preselected layers. Extensive experiments demonstrate TraDy achieves state-of-the-art performance across various downstream tasks and architectures while maintaining strict memory constraints, achieving up to 99\% activation sparsity, 95\% weight derivative sparsity, and 97\% reduction in FLOPs for weight derivative computation.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 24494

Loading