From Genomic Signals to Protein Behaviors: AI-Driven Feature Selection for Bioinformatics Modeling

Published: 05 Sept 2025, Last Modified: 05 Sept 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: Feature selection is a critical step in AI-driven bioinformatics, enabling models to extract the most informative signals from high-dimensional genomic and multi-omics datasets. By reducing noise and redundancy, feature selection enhances generalization and prevents overfitting, allowing AI systems to better capture the underlying biological principles of protein function and interaction. Recent advances in filter, wrapper, embedded, and hybrid feature selection methods have extended applications beyond classical gene prioritization to include cancer biomarker discovery, single-cell analysis, quantitative trait mapping, and integrative multi-omics studies. In this paper, we review emerging approaches in feature selection with a particular focus on their role in uncovering determinants of complex protein behaviors. We further discuss how feature selection can be integrated with AI to identify disease-associated features and guide mechanistic insights. Finally, we outline future research directions where feature selection can inform the design and interpretation of high-throughput experiments, advancing our ability to predict, model, and control emergent protein behaviors and disease processes.
Loading