Abstract: We study the classification problem for
high-dimensional data with $n$ observations on $p$ features where the
$p \times p$ covariance matrix $\Sigma$ exhibits a spiked eigenvalue structure and the
vector $\zeta$, given by the difference between the {\em whitened} mean
vectors, is sparse. We analyze an adaptive
classifier (adaptive with respect to the sparsity $s$) that first
performs dimension reduction on the feature vectors prior to classification in
the dimensionally reduced space, i.e., the classifier whitens
the data, then screens the features by keeping only those corresponding
to the $s$ largest coordinates of $\zeta$ and finally applies Fisher
linear discriminant on the selected features. Leveraging recent
results on entrywise matrix perturbation bounds for covariance
matrices, we show that the resulting classifier is Bayes optimal
whenever $n \rightarrow \infty$ and $s \sqrt{n^{-1} \ln p} \rightarrow
0$. Notably, our theory also guarantees Bayes optimality for the corresponding quadratic discriminant analysis (QDA). Experimental results on real and synthetic data further indicate that the proposed approach is competitive with state-of-the-art methods while operating on a substantially lower-dimensional representation.
Certifications: J2C Certification
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Trevor_Campbell1
Submission Number: 5423
Loading