Can the Training Loss be Predictive for Out-of-Distribution Generalization?

Lorenzo Brigato; Stavroula Mougiakakou

Can the Training Loss be Predictive for Out-of-Distribution Generalization?

Lorenzo Brigato, Stavroula Mougiakakou

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep learning, OOD generalization, signal propagation, hyper-parameter search

TL;DR: Architectural design enabling the training loss to predict OOD generalization when performing model selection

Abstract:

Traditional model selection in deep learning relies on carefully tuning several hyper-parameters (HPs) controlling regularization strength on held-out validation data, which can be challenging to obtain in scarce-data scenarios or may not accurately reflect real-world deployment conditions due to distribution shifts. Motivated by such issues, this paper investigates the potential of using solely the training loss to predict the generalization performance of neural networks on out-of-distribution (OOD) test scenarios. Our analysis reveals that preserving consistent prediction variance across training and testing distributions is essential for establishing a correlation between training loss and OOD generalization. We propose architectural adjustments to ensure $\textit{variance preservation}$, enabling reliable model selection based on training loss alone, even in over-parameterized settings with a sample-to-parameter ratio exceeding four orders of magnitude. We extensively assess the model-selection capabilities of $\textit{variance-preserving}$ architectures on several scarce data, domain-shift, and corruption benchmarks by optimizing HPs such as learning rate, weight decay, batch size, and data augmentation strength.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10352

Loading