Parameter Efficient Neural Networks With Singular Value Decomposed Kernels

David Vander Mijnsbrugge, Femke Ongenae, Sofie Van Hoecke

Published: 01 Jan 2023, Last Modified: 06 Nov 2023IEEE Trans. Neural Networks Learn. Syst. 2023Readers: Everyone

Abstract: Traditionally, neural networks are viewed from the perspective of connected neuron layers represented as matrix multiplications. We propose to compose these weight matrices from a set of orthogonal basis matrices by approaching them as elements of the real matrices vector space under addition and multiplication. Making use of the Kronecker product for vectors, this composition is unified with the singular value decomposition (SVD) of the weight matrix. The orthogonal components of this SVD are trained with a descent curve on the Stiefel manifold using the Cayley transform. Next, update equations for the singular values and initialization routines are derived. Finally, acceleration for stochastic gradient descent optimization using this formulation is discussed. Our proposed method allows more parameter-efficient representations of weight matrices in neural networks. These decomposed weight matrices achieve maximal performance in both standard and more complicated neural architectures. Furthermore, the more parameter-efficient decomposed layers are shown to be less dependent on optimization and better conditioned. As a tradeoff, training time is increased up to a factor of 2. These observations are consequently attributed to the properties of the method and choice of optimization over the manifold of orthogonal matrices.

0 Replies