Post-Hoc Feature Selection Layer for Neural Networks Interpretability

Post-Hoc Feature Selection Layer for Neural Networks Interpretability

ICLR 2026 Conference Submission22264 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: feature selection, deep learning interpretability, post hoc feature weighing

TL;DR: We introduce a post-hoc Feature Selection Layer as a lightweight, trainable module that integrates with already frozen pre-trained models on tabular datasets to highlight the features the original model considers most important.

Abstract: The interpretability of complex neural networks remains a critical challenge, especially for models already deployed in high-stakes domains. To address this, we introduce a post-hoc adaptation of the Feature Selection Layer (FSL). Our approach reframes the FSL as a lightweight, trainable module that integrates with already frozen pre-trained models on tabular datasets to highlight the features the original model considers most important. This post-hoc FSL learns relevance weights for input features by fine-tuning its weights based on the original model's learned outputs. Crucially, this process is non-invasive, operating without altering the original model's architecture or its learned parameters. We conducted our experiments using both statistical and visual metrics, including accuracy, F1 score, recall, precision, weighted t-SNE and silhouette score, and also analyzed the stability of the post-hoc FSL on high-dimensional synthetic and real-world tabular datasets. We compare the post-hoc FSL feature weighting method using these metrics against the original embedded FSL and other post-hoc interpretability methods, such as Integrated Gradients, Noise Tunnel, DeepLIFT, Gradient SHAP, and Feature Ablation. Experimental results demonstrate that post-hoc FSL feature weighting method successfully identified relevant features across the different datasets, maintaining the predictive power of the original neural network while enhancing its interpretability. While post-hoc FSL achieves similar predictive, visual and stability results comparable to the original FSL, it demonstrated distinct advantages over other state-of-the-art methods. Despite a trade-off in the Jaccard, Spearman and Pearson stability metrics, post-hoc FSL approach yielded, on average, superior performance on visual and clustering-based interpretability for real-world datasets, as measured by weighted t-SNE and the silhouette score.

Primary Area: interpretability and explainable AI

Submission Number: 22264

Loading