CypEGAT: A Deep Learning Framework Integrating Protein Language Model and Graph Attention Networks for Enhanced CYP450s Substrate Prediction

Published: 20 Dec 2024, Last Modified: 28 Dec 2024AI4Research @ AAAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Enzyme-substrate prediction; Deep learning; Drug Discovery; Protein language model; Graph attention network; Feature fusion
TL;DR: We present a deep learning framework which combines both protein and molecule information for human Cytochrome P450 enzymes substrate prediction.
Abstract: Human Cytochrome P450 enzymes (CYP450s) are responsible for metabolizing 70%-80% of clinically used drugs. The development of computational tools to accurately predict CYP450 enzyme-substrate interactions is crucial for drug discovery and chemical toxicology studies. In this work, we introduce CypEGAT, a deep learning framework designed to enhance prediction performance by integrating protein embeddings of CYP450s (extracted using the pre-trained ESM-2 Transformer model) with molecular embeddings generated by our fine-tuned Graph Attention Network (GAT). The CypEGAT model was trained end-to-end on two large-scale experimental enzyme-substrate datasets and our CYP450s dataset, which comprises 51,753 CYP450 enzyme-substrate pairs and 27,857 enzyme-nonsubstrate pairs. Focusing on five major human CYP450 isoforms (CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4), CypEGAT achieves an overall predictive accuracy of 0.882 and an AUROC of 0.928. The model demonstrates robust generalizability to novel chemical compounds across different CYP450 isoforms, underscoring its potential as a powerful tool for drug metabolism studies.
Archival Option: The authors of this submission want it to appear in the archival proceedings.
Submission Number: 31
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview