Keywords: Molecular structure ranking, Heterogeneous co-occurrence graph
Abstract: Identifying molecular structures in environmental and biological samples is essential for assessing ecological risks and human health, yet remains highly challenging due to the vast number of unidentified compounds. Tandem mass spectrometry (MS/MS) provides high-throughput spectrum measurements, but existing spectrum-driven identification approaches face key limitations: spectrum-isolated modeling methods are computationally expensive and tend to overlook molecular clustering effects. Moreover, network-based methods typically fail to incorporate environmental co-occurrence across chemical samples, yielding unsatisfactory performance. To address these challenges, we revisit molecular identification as spectrum-driven molecular structure ranking and propose \textsc{MoleRanker}, a novel heterogeneous graph neural network that integrates chemical constraints with environmental co-occurrence patterns. Specifically, we first construct a heterogeneous co-occurrence graph that encodes both \textit{molecular-level chemical clustering effects} and \textit{sample-level environmental co-occurrence correlations}. We then design a multiplex-relation message-passing mechanism to perform information propagation in a relation-aware manner across these heterogeneous relations. We construct four diverse datasets, including in-situ environmental pollutants and human metabolomics, and release them as a benchmark for spectrum-driven molecular structure ranking. Extensive experiments demonstrate that \textsc{MoleRanker} achieves state-of-the-art performance, improving mean reciprocal rank (MRR) by 12.18\% on average. Beyond accuracy, our approach opens new opportunities for discovering emerging pollutants and advancing the molecular understanding of human metabolism through graph-based integration of chemical and environmental evidence. Code is available at \url{https://anonymous.4open.science/r/MoleRanker}.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 8505
Loading