Subject Clustering by an Improved IF-PCA Algorithm

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: gene microarray, scRNA-seq, feature selection, manifold fitting, nonlinearity, PCA, sparsity, subject clustering
TL;DR: Subject Clustering by an Improved IF-PCA Algorithm
Abstract: Subject (e.g., cell or patient) clustering is an important problem in genetics and genomics. Influential features PCA (IF-PCA) is a recent idea for clustering, where we first select a small fraction of measured features and then cluster subjects (e.g., cells or patient) into different groups using the classical PCA clustering approach. A challenge the method faces is that, we may have complex signal and noise structures across features or across subjects or both, which may make the IF-PCA less effective. To deal with such a challenge, we propose a new approach, IFPCA+, where we combine IF-PCA with the recent idea of manifold fitting. The latter was shown to better support class separation. We compare our approach with the most popular subject clustering approaches, including but not limited to DESC, SC3 and Seurat, using 10 gene microarray data sets and 8 single-cell data sets. We show that with the new method, we have a significant improvement in feature selection accuracy, and that on average, our method outperforms several of the most competitive algorithms nowadays (including IF-PCA, DESC, Seurat) in terms of clustering accuracy and ARI. We also shed light on the insight underlying such improvements.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9117
Loading