Interpreting Black-boxes Using Primitive Parameterized FunctionsDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Interpretability, Symbolic Metamodeling, Symbolic Regression
Abstract: One approach for interpreting black-box machine learning models is to find a global approximation of the model using simple interpretable functions, which is called a metamodel (a model of the model). Approximating the black-box with a metamodel can be used to 1) estimate instance-wise feature importance; 2) understand the functional form of the model; 3) analyze feature interactions. In this work, we propose a new method for finding interpretable metamodels. Our approach utilizes Kolmogorov superposition theorem, which expresses multivariate functions as a composition of univariate functions (our primitive parameterized functions). This composition can be represented in the form of a tree. Inspired by symbolic regression, we use a modified form of genetic programming to search over different tree configurations. Gradient descent is used to optimize the parameters of a given configuration. Using several experiments, we show that our method outperforms recent metamodeling approaches suggested for interpreting black-boxes.
One-sentence Summary: Interpreting black boxes by finding a metamodel using parameterized functions trained via gradient descent and genetic programming.
Supplementary Material: zip
9 Replies

Loading