Abstract: A rank-r matrix X ∈ ℝ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">m×n</sup> can be written as a product UV <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">T</sup> , where U ∈ ℝ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">m×r</sup> and V ∈ ℝ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n×r</sup> . One could exploit this observation in optimization: e.g., consider the minimization of a convex function f(X) over rank-r matrices, where the set of rank-r matrices is modeled via the factorization in U and V variables. Such heuristic has been widely used before for problem instances, where the solution is (approximately) low-rank. Though such parameterization reduces the number of variables and is more efficient w.r.t. computational and memory requirements (of particular interest is the case r ≪ min{m, n}), it comes at a cost: f(UV <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">T</sup> ) becomes a non-convex function w.r.t. U and V. In this paper, we study such parameterization in optimizing generic smooth convex f, that has Lipschitz continuous gradients, and focus on first-order, gradient descent algorithmic solutions. We propose the Bi-Factored Gradient Descent (BFGD) algorithm, an efficient first-order method that operates on the U, V factors. We show that when f is smooth and BFGD is initialized properly, it has local sublinear convergence to a globally optimum point. As a test case, we consider the 1-bit matrix completion problem: We compare BFGD with state-of-the-art approaches and show that it has at least competitive test error performance on real dataset experiments, while being faster in execution, as compared to the rest of the algorithms. We conclude this work with some remarks and open questions for further investigations.
0 Replies
Loading