Keywords: feature selection, tabular data, dimensionality reduction, supervised learning, Gumbel-Softmax
TL;DR: End-to-end feature selection method that automatically finds the minimal relevant features with constant overhead.
Abstract: Feature selection (FS) is a fundamental challenge in machine learning, particularly for high-dimensional tabular data, where interpretability and computational efficiency are critical. Existing FS methods often cannot automatically detect the number of attributes required to solve a given task and involve user intervention or model retraining with different feature budgets. Additionally, they either neglect feature relationships (filter methods) or require time-consuming optimization (wrapper and embedded methods). To address these limitations, we propose AutoNFS, which combines the FS module based on Gumbel-Sigmoid sampling with a predictive model evaluating the relevance of the selected attributes. The model is trained end-to-end using a differentiable loss and automatically determines the minimal set of features essential to solve a given downstream task. Unlike existing approaches, AutoNFS achieves a nearly constant computational overhead regardless of input dimensionality, making it scalable to large data spaces. We evaluate AutoNFS on well-established classification and regression benchmarks as well as real-world metagenomic datasets. The results show that AutoNFS consistently outperforms both the classical and neural FS methods while selecting significantly fewer features. We share our implementation of AutoNFS at https://anonymous.4open.science/r/AutoNFS-8753.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 4296
Loading