Keywords: Binding Affinity Prediction, BPE, Virtual Screen
Abstract: Accurate molecular representations are critical for drug discovery, and a central
challenge lies in capturing the chemical environment of molecular fragments,
as key interactions, such as H-bond and π stacking, which occur only under specific
local conditions. Most existing approaches represent molecules as atom-level
graphs; however, individual atoms cannot express stereochemistry, lone pairs,
conjugation, and other complex features. Fragment-based methods (e.g., principal
subgraph or functional group libraries) fail to preserve essential information such
as chirality, aromatic bond integrity, and ionic states. This work addresses these
limitations from two aspects. (i) OverlapBPE tokenization. We propose a
novel data-driven molecule tokenization method. Unlike existing approaches, our
method allows overlapping fragments, reflecting the inherently fuzzy boundaries
of small-molecule substructures and, together with enriched chemical information
at the token level, thereby preserving a more complete chemical context. (ii) h-
MINT model. We develop a hierarchical molecular interaction network capable
of jointly modeling drug–target interactions at both atom and fragment levels. By
supporting fragment overlaps, the model naturally accommodates the many-to-
many atom–fragment mappings introduced by the OverlapBPE scheme. Extensive
evaluation against state-of-the-art methods shows our method improves binding
affinity prediction by 2-4% Pearson/Spearman correlation on PDBBind and LBA,
enhances virtual screening by 1-3% in key metrics on DUD-E and LIT-PCBA, and
achieves the best overall HTS performance on PubChem assays. Further analysis
demonstrates that our method effectively captures interactive information while
maintaining good generalization.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 9848
Loading