Abstract: Recent studies of high-dimensional covariance estimation often assume the proportional growth asymptotic, where the sample size n and dimension p are comparable, with n, p → ∞ and γn ≡ p/n → γ > 0.
Yet, many datasets—perhaps most—have very different numbers of rows and columns. Consider instead
disproportional growth, where n, p → ∞ and γn → 0 or γn → ∞. With far fewer dimensions than
observations, the disproportional limit γn → 0 may seem similar to classical fixed-p asymptotics. In fact,
either disproportional limit induces novel phenomena distinct from the proportional and fixed-p limits.
We study the spiked covariance model, with population covariance a low-rank perturbation of the
identity. For each of 15 different loss functions, and each disproportional limit, we exhibit in closed form
new optimal shrinkage and thresholding rules; optimality takes the particularly strong form of unique
asymptotic admissibility. Readers who initially view the disproportionate limit γn → 0 as similar to
classical fixed-p asymptotics may expect, given the dominance in that setting of the sample covariance
estimator, that there is no alternative here. On the contrary, although the sample covariance is consistent
as γn → 0, our optimal procedures demand extensive eigenvalue shrinkage and offer substantial performance benefits. The sample covariance is similarly improvable in the disproportionate limit γn → ∞.
Practitioners may worry how to choose between proportional and disproportional growth frameworks
in practice. Conveniently, under the spiked covariance model there is no conflict between the two and
no choice is needed; one unified set of closed forms (used with the aspect ratio γn of the practitioner’s
data) offers full asymptotic optimality in both regimes.
At the heart of these phenomena is the spiked Wigner model, in which we seek to recover a lowrank matrix perturbed by symmetric noise. The eigenvalue distributions of the spiked covariance under
disproportionate growth (appropriately scaled) and the spiked Wigner converge to a common limit—the
semicircle law. Exploiting this connection, we derive optimal performance levels and eigenvalue shrinkage
formulas for the spiked Wigner setting, of independent and fundamental interest. These formulas visibly
correspond to our formulas for optimal shrinkage in covariance estimation.
0 Replies
Loading