Keywords: Model Understanding, Data Generation
TL;DR: In order to explain trained models, we pose questions in the form of functions on the data space, answers of which are generated data points.
Abstract: There is a growing need for investigating how machine learning models operate. With this work, we aim to understand trained machine learning models by questioning their data preferences. We propose a mathematical framework that allows us to probe trained models and identify their preferred samples in various scenarios including prediction-risky, parameter-sensitive, or model-contrastive samples. To showcase our framework, we pose these queries to a range of models trained on a range of classification and regression tasks, and receive answers in the form of generated data.
Primary Area: interpretability and explainable AI
Submission Number: 14621
Loading