Does ‘Deep Learning on a Data Diet’ reproduce? Overall yes, but GraNd at Initialization does not

Andreas Kirsch

Does ‘Deep Learning on a Data Diet’ reproduce? Overall yes, but GraNd at Initialization does not

Andreas Kirsch

Published: 28 Sept 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Authors that are also TMLR Expert Reviewers: ~Andreas_Kirsch1

Abstract: Training deep neural networks on vast datasets often results in substantial computational demands, underscoring the need for efficient data pruning. In this context, we critically re-evaluate the data pruning metrics introduced in `Deep Learning on a Data Diet' by Paul et al. (2021): the Gradient Norm (GraNd) (at initialization) and the Error L2 Norm (EL2N). Our analysis uncovers a strong correlation between the GraNd scores at initialization and a sample's input norm, suggesting the latter as a potential baseline for data pruning. However, comprehensive tests on CIFAR-10 show neither metric outperforming random pruning, contradicting one of the findings in Paul et al. (2021). We pinpoint the inconsistency in the GraNd at initialization results to a later-fixed bug in FLAX's checkpoint restoring mechanism (https://github.com/google/flax/commit/28fbd95500f4bf2f9924d2560062fa50e919b1a5). Altogether, our findings do not support using the input norm or GraNd scores at initialization for effective data pruning. Nevertheless, EL2N and GraNd scores at later training epochs do provide useful pruning signals, aligning with the expected performance.

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/blackhc/pytorch_datadiet https://github.com/blackhc/data_diet

Certifications: Expert Certification, Reproducibility Certification

Assigned Action Editor: ~Caglar_Gulcehre1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 997

Loading