Towards the Decisive Factor of Symbolic Generalization of DNNs

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Model Generalization, Overfitting, Deep Learning Theory
TL;DR: This study proves that it is the randomness of parameter initialization in the low layers of a DNN that determines the composition of its confusing samples.
Abstract: The decisive factor that drives deep neural networks (DNNs) to learn non-generalizable representations (*i.e.,* non-generalizable interactions between input variables) has been a persistent challenge in the field of symbolic generalization. In this paper, we quantify the generalization power of such interactions encoded by DNNs, and we discover that DNNs usually learn non-generalizable interactions from a few samples, referred to as *confusing samples*. The emergence of confusing samples during the training process explains the overfitting of a DNN. We further discover that the composition of confusing samples is determined by the randomness of parameter initialization in the low layers of a DNN. In contrast, other factors, such as high-layer parameters and network architecture, have much less impact on the composition of confusing samples. Consequently, two DNNs initialized with different low-layer parameters will eventually learn entirely different sets of confusing samples, even though they have similar performance.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 6746
Loading