Does ‘Deep Learning on a Data Diet’ reproduce? Overall yes, but GraNd at Initialization does not

Published: 28 Sept 2023, Last Modified: 27 Oct 2023Accepted by TMLREveryoneRevisionsBibTeX
Authors that are also TMLR Expert Reviewers: ~Andreas_Kirsch1
Abstract: Training deep neural networks on vast datasets often results in substantial computational demands, underscoring the need for efficient data pruning. In this context, we critically re-evaluate the data pruning metrics introduced in `Deep Learning on a Data Diet' by Paul et al. (2021): the Gradient Norm (GraNd) (at initialization) and the Error L2 Norm (EL2N). Our analysis uncovers a strong correlation between the GraNd scores at initialization and a sample's input norm, suggesting the latter as a potential baseline for data pruning. However, comprehensive tests on CIFAR-10 show neither metric outperforming random pruning, contradicting one of the findings in Paul et al. (2021). We pinpoint the inconsistency in the GraNd at initialization results to a later-fixed bug in FLAX's checkpoint restoring mechanism ( Altogether, our findings do not support using the input norm or GraNd scores at initialization for effective data pruning. Nevertheless, EL2N and GraNd scores at later training epochs do provide useful pruning signals, aligning with the expected performance.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Certifications: Expert Certification, Reproducibility Certification
Assigned Action Editor: ~Caglar_Gulcehre1
Submission Number: 997