BioSensGraph: Predicting Biopolymer Interactions via Knowledge Graph Embedding on a Property Graph of Molecular Entities
Keywords: knowledge graph, knowledge graph embedding, link prediction, biosensor
TL;DR: The large-scale biomolecular knowledge graph (1.3M entities, 43M edges) is constructed by integrating heterogeneous data sources and evaluated with PyTorch-BigGraph embeddings for link prediction.
Abstract: Existing biomedical knowledge graphs are primarily geared toward drug repurposing and pathway analysis (gene–disease–drug). For biosensing, however, the primary early-stage task is different: selecting recognition elements (RE) that bind selectively to a given analyte. We present a large-scale biomolecular knowledge graph that aggregates data from 15 heterogeneous open sources: ~1.3 M entities and ~43 M edges of three types - interacts_with (experimental analyte-RE interactions), has_similarity (structure/sequence similarity), and has_biomarker (associations with physiological conditions). Despite typical sparsity, the graph is highly connected (97% of nodes in the giant component) and exhibits heavy-tailed degree distributions.
We cast the problem as large-scale link prediction on symmetric IW edges using PyTorch-BigGraph and introduce a symmetry-aware protocol (mirror pairs are not assigned to different splits). In a controlled operator-comparator study under a pairwise ranking loss, the unit-norm DistMult (cosine) configuration delivers the most stable results (MRR = 0.457, Hits@10 = 0.822) on a ~2.6 M-triple test set. A lightweight web interface supports interactive navigation and analysis. Overall, our KG and protocol provide in-vitro-oriented ranking of analyte-RE pairs, helping to narrow the experimental search space and accelerate the transition to sensor prototypes.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 25530
Loading