Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical GuaranteesDownload PDF

Published: 31 Oct 2022, Last Modified: 12 Oct 2022NeurIPS 2022 AcceptReaders: Everyone
Keywords: variable selection, posterior distribution, Bayesian nonparametric
Abstract: We develop a simple and unified framework for nonlinear variable importance estimation that incorporates uncertainty in the prediction function and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods, neural networks, etc). In particular, for a learned nonlinear model $f(\mathbf{x})$, we consider quantifying the importance of an input variable $\mathbf{x}^j$ using the integrated partial derivative $\Psi_j = \Vert \frac{\partial}{\partial \mathbf{x}^j} f(\mathbf{x})\Vert^2_{P_\mathcal{X}}$. We then (1) provide a principled approach for quantifying uncertainty in variable importance by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulations and experiments on healthcare benchmark datasets confirm that the proposed algorithm outperforms existing classical and recent variable selection methods.
Supplementary Material: pdf
TL;DR: This method provides a theoretically-grounded approach for quantifying variable importance that is applicable to a wide range of ML models.
19 Replies

Loading