Modify Training Direction in Function Space to Reduce Generalization Error

Yi Yu; Boyu Chen; Wenlian Lu

Modify Training Direction in Function Space to Reduce Generalization Error

Yi Yu, Boyu Chen, Wenlian Lu

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Supplementary Material: zip

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Neural tangent kernel, Generalization enhancement, Natural gradient

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: To improve generalization performance by modifying the training dynamics, we present theoretical analyses of a modified natural gradient descent method in the neural network function space, leveraging the neural tangent kernel theory.

Abstract: To improve generalization performance by modifying the training dynamics, we present theoretical analyses of a modified natural gradient descent method in the neural network function space, leveraging the neural tangent kernel theory. Firstly, we provide an analytical expression for the function acquired through this modified natural gradient descent under the assumptions of an infinite network width limit and a Gaussian conditional output distribution. Subsequently, we explicitly derive the generalization error associated with the learned neural network function. By interpreting the generalization error as stemming from the distribution discrepancy between the training data and the true data, we propose a criterion for modification in the eigenspaces of the Fisher information matrix to reduce the generalization error bound. Through this approach, we establish that modifying the training direction of the neural network in function space leads to a reduction in generalization error. These theoretical results are also illustrated through numerical experiments. Additionally, we demonstrate the connections between this theoretical framework and existing results of generalization-enhancing methods.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7621

Loading