Explainable Artificial Intelligence for Bioactivity Prediction: Unveiling the Challenges with Curated CDK2/4/6 Breast Cancer Dataset

Adam Sulek, Jakub Klimczak, Jakub Jonczyk, Tomasz Kosciolek, Tomasz Danel, Barbara Pucelik

Published: 01 Jan 2025, Last Modified: 24 Jul 2025ICCS (1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In recent years, the interplay between machine learning (ML) and cheminformatics has driven advancements in bioactivity prediction. However, the challenge of model explainability remains a significant barrier to adopting these approaches in drug discovery. This study addresses critical shortcomings in existing modeling techniques by examining the assumptions of feature independence and contribution additivity that are the foundation of traditional explainability methods. We investigate fingerprint-based and molecular graph models within quantitative structure-activity relationship modeling. While these models demonstrate impressive predictive performance, they offer limited actionable insights for medicinal chemists. To assist researchers in developing useful and interpretable activity prediction models, we propose a new benchmark based on the pharmacophore concept, commonly used in preliminary compound filtering. Furthermore, we introduce PharmacoScore, a novel evaluation metric designed to assess whether ML-based explanations prioritize essential pharmacophore components over non-critical features. Our findings highlight a crucial misalignment between ML model explanations and established pharmacophore principles, revealing a pressing need for innovative interpretability strategies in cheminformatics. This work not only offers a valuable resource but also sets the stage for future research, enhancing the transparency of ML in drug discovery.