Keywords: Concept learning, Disentanglement learning, Explainability, Interpretability
Abstract: Recent work on Explainable AI has focused on concept-based explanations, where deep learning models are explained in terms of high-level units of information, referred to as concepts. In parallel, the field of disentanglement learning has explored the related notion of finding underlying factors of variation in the data that have interpretability properties. Despite their overlapping purpose, the metrics to evaluate the quality of concepts and factors of variation in the two fields are not aligned, hindering a systematic comparison. In this paper we consider factors of variation as concepts and thus unify the notations in concept and disentanglement learning. Next, we propose metrics for evaluating the quality of concept representations in both approaches, in the presence and in the absence of ground truth concept labels. Via our proposed metrics, we benchmark state-of-the-art methods from both families, and propose a set of guidelines to determine the impact that supervision may have on the quality of learnt concept representations.
One-sentence Summary: This paper proposes metrics and guidelines for comparing the quality of concept representations within and between concept learning and disentanglement learning.
17 Replies
Loading