Keywords: Imbalanced Learning, Imbalanced Regression, Graph-based Learning, Graph Representation Learning, Molecular Property Prediction
Abstract: Imbalanced regression is pervasive in molecular property prediction, where the most valuable compounds (e.g., high potency) occupy sparse regions of the label space. Standard Graph Neural Networks (GNNs) optimize average error and underperform on these rare but critical cases, while existing oversampling methods often distort molecular topology. We introduce SPECTRA, a Spectral Target-Aware graph augmentation framework that generates realistic molecular graphs in the spectral domain. SPECTRA (i) reconstructs multi-attribute molecular graphs from SMILES; (ii) aligns molecule pairs via (Fused) Gromov–Wasserstein couplings to obtain node correspondences; (iii) interpolates Laplacian eigenvalues/eigenvectors and node features in a stable shared basis; and (iv) reconstructs edges to synthesize physically plausible intermediates with interpolated targets. A rarity-aware budgeting scheme, derived from a kernel density estimation of labels, concentrates augmentation where data are scarce. Coupled with a spectral GNN using edge-aware Chebyshev convolutions, SPECTRA densifies underrepresented regions without degrading global structure. On benchmarks, SPECTRA consistently improves error in rare target ranges while maintaining competitive overall MAE, and yields interpretable synthetic molecules whose structure reflects the underlying spectral geometry. Our results demonstrate that spectral, geometry-aware augmentation is an effective and efficient strategy for imbalanced molecular property regression.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 25003
Loading