Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent

Santhosh Karnik; Anna Veselovska; Mark Iwen; Felix Krahmer

Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent

Santhosh Karnik, Anna Veselovska, Mark Iwen, Felix Krahmer

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 oralEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime.

Abstract: We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime. For matrix factorization problems, this phenomenon has been studied in a number of works. A particular challenge has been to design universal initialization strategies which provably lead to implicit regularization in gradient-descent methods. At the same time, it has been argued by Cohen et. al. 2016 that more general classes of neural networks can be captured by considering tensor factorizations. However, in the tensor case, implicit regularization has only been rigorously established for gradient flow or in the lazy training regime. In this paper, we prove the first tensor result of its kind for gradient descent rather than gradient flow. We focus on the tubal tensor product and the associated notion of low tubal rank, encouraged by the relevance of this model for image data. We establish that gradient descent in an overparametrized tensor factorization model with a small random initialization exhibits an implicit bias towards solutions of low tubal rank. Our theoretical findings are illustrated in an extensive set of numerical simulations show-casing the dynamics predicted by our theory as well as the crucial role of using a small random initialization.

Lay Summary: Many machine learning models (including neural networks) often work well, even when they use far more parameters than data points. One mystery is why these models don’t overfit the training data but instead find simple solutions that fit unseen data points. A line of research known as implicit regularization shows that gradient descent (a common algorithm for training these models) naturally nudges the models towards simpler solutions, even without being explicitly told to avoid complicated solutions. We take a step towards the broader goal of understanding this paradigm by establishing this phenomenon for the problem of recovering a tensor (a multidimensional array of numbers) from limited observations. Like neural network training, our tensor recovery problem has many more parameters than necessary, and thus, many solutions exist. Yet, we show that gradient descent finds a “simple” low-rank tensor which fits the observations instead of one of the many “complicated” high-rank tensors which also fit. Hence, implicit regularization can be used to recover low-rank tensors, which naturally arise in a wide range of applications, including video processing and recommender systems. We expect that studying implicit regularization in a controlled setting, will pave the way towards theoretical insights into training neural networks and designing better training algorithms in the future.

Link To Code: https://github.com/AnnaVeselovskaUA/tubal-tensor-implicit-reg-GD.git

Primary Area: Theory->Learning Theory

Keywords: overparameterization, implicit regularization, tensor factorization

Submission Number: 16047

Loading