Generalization Gradient Descent

18 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: out-of-distribution, generalization problem, gradient descent
TL;DR: n/a
Abstract: We propose a new framework for evaluating the relationship between features and generalization via a theoretical analysis of the out-of-distribution (OOD) generalization problem, in which we simultaneously use two mathematical methods: a generalization ratio that quantitatively characterizes the degree of generalization, and a generalization decision process (GDP) that formalizes the relationship of loss between seen and unseen domains. By combining the concepts of informativeness and variation in the generalization ratio, we intuitively associate them with OOD problems to derive the generalization inequality. We then introduce it to the GDP to select the best loss from seen domains to gradient descent for backpropagation. In the case where the classifier is defined by fully connected neural network, the entire system is trained with backpropagation. There is no need for any model selection criterion or operating on gradients during training. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generalization ability.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 12437
Loading