Addressing Model Overcomplexity in Drug-Drug Interaction Prediction With Molecular Fingerprints

Manel Gil-Sorribes; Alexis Molina

Addressing Model Overcomplexity in Drug-Drug Interaction Prediction With Molecular Fingerprints

Manel Gil-Sorribes, Alexis Molina

Published: 06 Mar 2025, Last Modified: 26 Apr 2025GEMEveryoneRevisionsBibTeXCC BY 4.0

Track: Machine learning: computational method and/or computational results

Nature Biotechnology: Yes

Keywords: drug-drug interaction, molecular representation, benchmark, molecular fingerprint, explainability

TL;DR: We show that simple molecular fingerprint embeddings outperform or match complex deep learning models in drug-drug interaction and affinity prediction, providing a more interpretable and computationally efficient alternative.

Abstract: Accurately predicting drug-drug interactions (DDIs) is crucial for pharmaceutical research and clinical safety. Recent deep learning models often suffer from high computational costs and limited generalization across datasets. In this study, we investigate a simpler yet effective approach using molecular representations such as Morgan fingerprints (MFPS), graph-based embeddings from graph convolutional networks (GCNs), and transformer-derived embeddings from MoLFormer integrated into a straightforward neural network. We benchmark our implementation on DrugBank DDI splits and a drug-drug affinity (DDA) dataset from the Food and Drug Administration. MFPS along with MoLFormer and GCN representations achieve competitive performance across tasks, even in the more challenging leak-proof split, highlighting the sufficiency of simple molecular representations. Moreover, we are able to identify key molecular motifs and structural patterns relevant to drug interactions via gradient-based analyses using the representations under study. Despite these results, dataset limitations such as insufficient chemical diversity, limited dataset size, and inconsistent labeling impact robust evaluation and challenge the need for more complex approaches. Our work provides a meaningful baseline and emphasizes the need for better dataset curation and progressive complexity scaling.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Presenter: ~Manel_Gil-Sorribes1

Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.

Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.

Submission Number: 109

Loading