Abstract: In this study we attack the conundrum of success of over-parameterized models from un-
derstanding the complex relationship between parameter space and output space.
We classify key parameter sets related to generalization and training in parametric ba-
sis expansion machine learning models. Methods ranging from Linear regression, Extreme
learning machines to Neural networks fall in this category. We also classify these parametric
models into identifiable and non-identifiable models according to the mapping from param-
eter space to function space. Such a classification of models is already present in literature
but usually studied in Bayesian ML and statistics. We focus on identifiable models in this
article.
We later classify generalization into strict and weak generalization according to learning
in parameter space for fixed basis regression models which fall into the category of model-
identifiability. Strict generalization is when true parameters (or their un-identifiable coun-
terparts) of the ground truth are learned, while weak generalization is when they are not
learned but we still achieve local generalization. We showcase the conditions needed for strict
generalization in fixed basis regression settings, trained using pseudo-inverse methods. We
showcase that strict generalization cannot be achieved in over-parameterized regimes trained
through pseudo-inverse method, but approaching the strict generalization using gradient
descent can be completely dependent on our initialization and randomness. Thus support-
ing the classical idea that over-parameterization is bad, but emphasizing that it applies
to strict generalization case. However, weak generalization can always be achieved in over-
parameterized regimes under certain cases. Thus we study the complex relationship between
generalization in output space and the parameter space to understand the conundrum of
success of over-parameterized models and try to weave a coherent and consistent picture of
the same.
Later we study generalization performance under label noise, for the distinct scenarios iden-
tified in this article. We include insights into the theory of deep learning and quantum
machine learning.
Our work serves as refinement of the idea of generalization as well as it provides insights
through proof and sometimes demonstrations. Our focus in this article is purely taxonomical
and conceptual rather than driven by introduction of new metrics.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Version 1.
Removed a GitHub link for anonymity, in order to comply with the double-blind review policy.
No other changes to the manuscript content.
Version 2.
Improved the introduction and conclusion and reduced redundancy in the presentation.
Refined the related works
section.
Introduction of the narrative of identifiable and non-identifiable models. in-
dentifiable models are models where there is injective mapping between pa-
rameter space and function space. Which is an important distinction left out in previous version. Improved the defi-
nition of strict generalization to remove any ambiguity between identifiable and
non-identifiable models. Included Appendix A to discuss identifiability of fixed
basis regression model.
The FBR models are identifiable models and Neural networks are non-identifiable models. We stay put on the idea of strict generalization and weak generalization but rephrase that the conditions of theorem 3.1 to apply to FBR models only, that are trained using pseudo-inverse.
Improving the statement of claims for any misinterpretation.
Added a section on implicit regularization and classification of parameter
sets in the model parameter space namely section 6. FIg. 12 is an extra figure.
Include discussion on the convexity and non-convexity questions in this section.
Assigned Action Editor: ~Russell_Tsuchida1
Submission Number: 5790
Loading