Gradient-Based Gene Selection for Multimodal scRNA-seq Foundation Models

Published: 05 Mar 2025, Last Modified: 07 May 2025MLGenX 2025 TinyPapersEveryoneRevisionsBibTeXCC BY 4.0
Track: Tiny paper track (up to 5 pages)
Abstract: Foundation models have emerged as powerful tools for analyzing single-cell RNA sequencing (scRNA-seq) data. However, selecting informative gene features for both input to the model and analysis in the output remains a critical challenge. Traditional feature selection methods filter on the basis of highly variable genes and analyze them using differential distribution, but they often struggle with scalability and robustness in heterogeneous, high-dimensional datasets. In this study, we explore the limitations of conventional feature selection techniques in the context of a multimodal foundation model and propose alternative gradient-based attribution techniques on learned feature embeddings to improve feature selection. We demonstrate how our selection strategy enhances model performance, overcomes the limitations of traditional approaches, and holds the potential to reveal the inherent polygenicity of diseases.
Submission Number: 50
Loading