Abstract: The primary challenge for practitioners with multiple \textit{post-hoc gradient-based} interpretability methods is to benchmark them and select the best. Using information theory, we represent finding the optimal explainer as a rate-distortion optimization problem. Therefore :
\begin{itemize}
\item We propose an information-theoretic test \verb|InfoExplain| to resolve the benchmarking ambiguity in a model agnostic manner without additional user data (apart from the input features, model, and explanations).
\item We show that \verb|InfoExplain| is extendable to utilise human interpretable concepts, deliver performance guarantees, and filter out erroneous explanations.
\end{itemize}
The adjoining experiments, code can be found at \url{github.com/DebarghaG/info-explain}
In-person Presentation: no
0 Replies
Loading