Keywords: non-convex optimization, low-rank matrix optimization, matrix sensing, implicit bias, tensor, over-parametrization
TL;DR: Our work proves that optimizing over the tensor space with gradient descent can also induce implicit bias, making over-parameterized models tractable to use.
Abstract: Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role of GD in inducing implicit regularization for tensor optimization, particularly within the context of the lifted matrix sensing framework. This framework has been recently proposed to address the non-convex matrix sensing problem by transforming spurious solutions into strict saddles when optimizing over symmetric, rank-1 tensors. We show that, with sufficiently small initialization scale, GD applied to this lifted problem results in approximate rank-1 tensors and critical points with escape directions. Our findings underscore the significance of the tensor parametrization of matrix sensing, in combination with first-order methods, in achieving global optimality in such problems.
Supplementary Material: pdf
Submission Number: 8207
Loading