Abstract: Random label noises (or observational noises) widely exist in practical machinelearning settings. we analyze the learning dynamics of stochastic gradient descent(SGD) over the quadratic loss with unbiased label noises, and investigate a newnoise term of dynamics, which is dynamized and influenced by mini-batch sam-pling and random label noises, as an implicit regularizer. Our theoretical analysisfinds such implicit regularizer would favor some convergence points that could stabilize model outputs against perturbation of parameters. To validate our analy-sis, we use our theorems to estimate the closed-form solution of the implicit reg-ularizer over continuous-time SGD dynamics for Ordinary Least-Square (OLS), where the numerical simulation backups our estimates. We further extend our proposals to interpret the newly-fashioned noisy self-distillation tricks for deep learning, where the implicit regularizer demonstrates a unique capacity of selecting models with improved output stability through learning from well-trained teach-ers with additive unbiased random label noises
One-sentence Summary: The random label noises perform as an implicit regularizer of SGD and help the learning procedure select a stable solution.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=jL_TF5J1yy
6 Replies
Loading