Keywords: Random Matrix Theory, High-Dimensional Statistics, Matrix Denoising, Gradient Flow
TL;DR: Derive a closed-form solution for the learning dynamics of GD-based rank-one matrix denoising and reveal the BBP transition in the large-time limit.
Abstract: Matrix denoising is a crucial component in machine learning, offering valuable insights into the behavior of learning algorithms (Bishop and Nasrabadi, 2006). This paper focuses on the rectangular matrix denoising problem, which involves estimating the left and right singular vectors of a rank-one matrix that is corrupted by additive noise. Traditional algorithms for this problem often exhibit high computational complexity, leading to the widespread use of gradient descent (GD)-based estimation methods with a quadratic cost function. However, the learning dynamics of these GD-based methods, particularly the analytical solutions that describe their exact trajectories, have been largely overlooked in existing literature. To fill this gap, we investigate the learning dynamics in detail, providing convergence proofs and asymptotic analysis. By leveraging tools from large random matrix theory, we derive a closed-form solution for the learning dynamics, characterized by the inner products of the estimates and the ground truth vectors. We rigorously prove the almost sure convergence of these dynamics as the signal dimensions tend to infinity. Additionally, we analyze the asymptotic behavior of the learning dynamics in the large-time limit, which aligns with the well-known Baik-Ben Arous-Péchée phase transition phenomenon n (Baik et al., 2005). Experimental results support our theoretical findings, demonstrating that when the signal-to-noise ratio (SNR) surpasses a critical threshold, learning converges rapidly from an initial value close to the stationary point. In contrast, estimation becomes infeasible when the ratio of the inner products between the initial left and right vectors and their corresponding ground truth vectors reaches a specific value, which depends on both the SNR and the data dimensions.
Primary Area: learning theory
Submission Number: 15387
Loading