Abstract: Explainable AI (XAI) is a rapidly growing domain with a myriad of proposed methods as well as metrics aiming to evaluate their efficacy. However, current literature is often of limited scope, examining only a handful of XAI methods and ignoring underlying design parameters for performance, such as the model architecture or the nature of input data. Moreover, they often rely on one or a few metrics, neglecting thorough validation and increasing the risk of selection bias. These shortcomings leave practitioners confused about which method to choose for their problem. In response, we introduce LATEC, a large-scale benchmark that critically evaluates 17 prominent XAI methods using 20 distinct metrics. We systematically incorporate vital design parameters like varied architectures and diverse input modalities, resulting in 7,560 examined combinations. Through LATEC, we first showcase the high risk of conflicting metrics leading to unreliable rankings, and propose a robust evaluation scheme. Further, we comprehensively evaluate various XAI methods to assist practitioners in selecting appropriate methods aligning with their needs. Curiously, the emerging top-performing method, Expected Gradients, has not been examined in relevant related studies before. LATEC reinforces its role in future XAI research by publicly releasing all auxiliary data, including model weights, over 326k saliency maps, and 378k metric scores as a dataset. The benchmark is hosted at: https://github.com/kjdhfg/LATEC.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Quanshi_Zhang1
Submission Number: 2432
Loading