Abstract: We study the computational cost of recovering a unit-norm sparse principal component \(x \in \mathbb {R}^n\) planted in a random matrix, in either the Wigner or Wishart spiked model (observing either \(W + \lambda xx^\top \) with W drawn from the Gaussian orthogonal ensemble, or N independent samples from \(\mathcal {N}(0, I_n + \beta xx^\top )\), respectively). Prior work has shown that when the signal-to-noise ratio (\(\lambda \) or \(\beta \sqrt{N/n}\), respectively) is a small constant and the fraction of nonzero entries in the planted vector is \(\Vert x\Vert _0 / n = \rho \), it is possible to recover x in polynomial time if \(\rho \lesssim 1/\sqrt{n}\). While it is possible to recover x in exponential time under the weaker condition \(\rho \ll 1\), it is believed that polynomial-time recovery is impossible unless \(\rho \lesssim 1/\sqrt{n}\). We investigate the precise amount of time required for recovery in the “possible but hard” regime \(1/\sqrt{n} \ll \rho \ll 1\) by exploring the power of subexponential-time algorithms, i.e., algorithms running in time \(\exp (n^\delta )\) for some constant \(\delta \in (0,1)\). For any \(1/\sqrt{n} \ll \rho \ll 1\), we give a recovery algorithm with runtime roughly \(\exp (\rho ^2 n)\), demonstrating a smooth tradeoff between sparsity and runtime. Our family of algorithms interpolates smoothly between two existing algorithms: the polynomial-time diagonal thresholding algorithm and the \(\exp (\rho n)\)-time exhaustive search algorithm. Furthermore, by analyzing the low-degree likelihood ratio, we give rigorous evidence suggesting that the tradeoff achieved by our algorithms is optimal.
Loading