Why do Fine-grained Labels in Pretraining Benefit Generalization?

TMLR Paper2978 Authors

08 Jul 2024 (modified: 17 Sept 2024)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent literature shows that if a deep neural network is pretrained using fine-grained labeled data and then fine-tuned using coarse-labeled data for downstream tasks, its generalization performance is often better than pretraining using coarse-labeled data. While empirical evidence that support this finding is abundant, theoretical justification remains an open problem. This paper addresses the problem by introducing a "hierarchical multi-view" structure to confine the input data distribution. Under this data assumption, we prove that 1) coarse-grained pretraining only allows a neural network to learn the common features well, while 2) fine-grained pretraining helps the network learn the rare features in addition to the common ones, thus improving its accuracy on hard downstream test samples.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Dmitry_Kangin1
Submission Number: 2978
Loading