Abstract: The great diffusion of convolutional neural networks and transformers for image classification made the demand for transparency of deep models more urgent, especially when the decisions involve delicate issues, such as the health of people. In this paper we present some metrics to evaluate the performances of visual explanations of AI classifiers compared to the annotations of expert practitioners. These metrics are used to evaluate whether the state-of-the-art deep models for X-ray image classification are capturing the right information and to identify the most effective methods in providing explanations from black-box models. Insights from this analysis, carried out on the most comprehensive dataset about thoracic diseases (ChestX-ray14), could help practitioners in understanding potentialities and limitations of deep models in computer-aided diagnosis.
Loading