Keywords: DNN, Critical Learning Period
Abstract: Deep neural networks (DNNs) exhibit critical learning periods (CLPs) during early training phases, when exposure to defective data can permanently impair model performance. The prevalent understanding of such periods, primarily based on the interpretation of Fisher Information (FI), attributes CLPs to the memorization phase. However, our theoretical and empirical study exhibits that such explanations of CLPs are inaccurate because of the misunderstanding of the relationship between FI and model memorization. As such, we revisit the CLPs in DNNs from the information theory and optimization perspectives, gaining a better and more accurate understanding of CLPs.
We visualize model memorization dynamics and observe that CLPs extend beyond the memorization phase. Additionally, we introduce the concept of the effective gradient, a novel metric able to quantify the actual influence of each training epoch on the optimization trajectory. Our empirical and theoretical analyses reveal that the norm of effective gradients generally diminishes over training epochs and eventually converges to zero, highlighting the disproportionate larger impact of initial training on final model outcomes. Besides, this insight also clarifies the mechanism behind permanent performance degradation due to defective initial training: the model becomes trapped in the suboptimal region of parameter space. Our work offers novel and in-depth understandings of CLPs and sheds light on enhancing model performance and robustness through such periods.
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7896
Loading