Memorization and the Orders of Loss: A Learning Dynamics Perspective

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Memorization, Mislabelled Detection, Learning Dynamics
Abstract: Deep learning has become the de facto approach in nearly all learning tasks. It has been observed that deep models tend to memorize and sometimes overfit data, which can lead to compromises in performance, privacy, and other critical metrics. In this paper, we explore the theoretical foundations that connect memorization to various orders of sample loss, i.e., sample loss, sample loss gradient, and sample loss curvature, focusing on learning dynamics to understand what and how these models memorize. To this end, we introduce two proxies for memorization: Cumulative Sample Loss (CSL) and Cumulative Sample Gradient (CSG). CSL represents the accumulated loss of a sample throughout training, while CSG is the gradient with respect to the input, aggregated over the training process. CSL and CSG exhibit remarkable similarity to stability-based memorization, as evidenced by considerably high cosine similarity scores. We delve into the theory behind these results, demonstrating that CSL and CSG represent the bounds for stability-based memorization and learning time. Additionally, we extend this framework to include sample loss curvature and connect the three orders, namely, sample loss, sample loss gradient, and sample loss curvature, to learning time and memorization. The proposed proxy, CSL, is four orders of magnitude less computationally expensive than the stability-based method and can be obtained with zero additional overhead during training. We demonstrate the practical utility of the proposed proxies in identifying mislabeled samples and detecting duplicates where our metric achieves state-of-the-art performance. Thus, this paper provides a new tool for analyzing data as it scales in size, making it an important resource in practical applications.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8186
Loading