Comparative Analysis of k-Selection Methods in Non-Negative Matrix Factorization for Transcriptomic Data Analysis: The Superiority of Silhouette Analysis

10 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0
Keywords: NMF, parameter selection, transcriptomics
TL;DR: A comparative analysis to identify an optimal metric to evaluate the K parameter in non-negative matrix factorization
Abstract: Non-negative matrix factorization (NMF) has emerged as a powerful technique for dimensionality reduction and pattern discovery in transcriptomic data analysis. However, selecting the optimal number of factors (k) remains a significant challenge, particularly when balancing mathematical rigor with biological interpretability. We present a comprehensive comparative analysis of k-selection methods, including group-correlation maximization, reconstruction error minimization, PERMANOVA-based selection, and silhouette analysis. Applied to a large-scale transcriptomic dataset with 163 samples across 42 experimental conditions (combining genotype, treatment, and timepoint factors), our analysis revealed that silhouette analysis provides the optimal balance, selecting k=7 and achieving superior performance by ensuring uniform distribution of discriminative power across factors while generating sufficient resolution to distinguish sample groups. The k=7 solution strikes an optimal balance between preventing overfitting at higher k values while maintaining adequate biological resolution, validating silhouette analysis as the superior approach for NMF k-selection in transcriptomic applications.
Submission Number: 97
Loading