Advancing Drug-Target Interaction Prediction via Graph Transformers and Residual Protein Embeddings

Ellen Yi-Ge; Taric Chen; Heng Huang

Advancing Drug-Target Interaction Prediction via Graph Transformers and Residual Protein Embeddings

Ellen Yi-Ge, Taric Chen, Heng Huang

17 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Graph Transformer, Spectral Diagnostic

Abstract: Predicting drug-target interactions (DTIs) is important for the acceleration of drug discovery. Prevailing approaches often assume access to labeled target data or entangle training with opaque unsupervised alignment losses, which makes robustness hard to audit and failure modes difficult to diagnose. To address these gaps, we propose MoleProLink, a domain-shift-aware predictor of DTI for mining bioactive molecules which is based on the integration of methods inspired by measure-theoretic optimal transport, reproducing-kernel embeddings, and information-geometric perspectives. On the theory side, we present two compact risk-transfer control under the following two explicit assumptions: (i) Wasserstein-1 control under Lipschitz regularity assumption of the composed loss, and (ii) RKHS control with Maximum Mean Discrepancy (MMD). These statements are standard IPM-style bounds that are included here in a DTI-specific notation, we use them to motivate diagnostics and feature designing principles not to make any new forward inequalities. On the methodology side, we use a graph Transformer model for molecular graph with a sequence encoder for proteins. Protein embedding is performed with a residue based embedding (named as Residue2vec) and a bi-directional state space model, whereas molecular embedding is achieved through centrality and spatial encodings in a state space model Graph Transformer. Experimental results on three popular benchmarks (Human, C.elegans and Davis) show our method achieving strong AUC/AUPR, using a single protocol. Compared to the baselines, gains are achieved under the same data processing and negative-sampling; these margins are regarded not as inferential statements, but rather, as descriptive. We give implementation details that are sufficient for direct replication, and reproduce the ablative experiments that isolate the contributions of the protein sequence encoder and interaction decoder.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 9876

Loading