Keywords: Geometric Deep Learning, Solubility Prediction, SE(3)-Equivariance
Abstract: Accurate prediction of small molecule solubility requires balancing physical fidelity with computational scalability. While geometric deep learning offers strong inductive biases for molecular systems, applying full SE(3)-equivariance to dynamic multi-component systems can introduce substantial computational overhead. We introduce Solvaformer, a graph transformer for solubility prediction that selectively grounds interactions in geometry. The architecture applies SE(3)-equivariant attention to rigid intramolecular structure, while modeling fluid intermolecular interactions through computationally efficient scalar attention. We train Solvaformer in a multi-task setting on a combined dataset of quantum-mechanical calculations (CombiSolv-QM) and experimental measurements (BigSolDB~2.0). Solvaformer demonstrates strong performance, approaching the DFT-based baseline while remaining end-to-end and scalable. We also compare against a simpler MPNN augmented with machine-learning interatomic potential (MLIP)-derived partial charges, which achieves slightly better predictive accuracy. This suggests that for scalar solubility prediction, high-quality electronic descriptors can provide an effective alternative to explicit equivariant processing. Nevertheless, Solvaformer remains the best-performing end-to-end model that does not rely on external feature-generation pipelines, and its attention maps retain chemically meaningful interpretability, including the ability to distinguish intra- from intermolecular hydrogen bonding. These results highlight two practical strategies for scalable solution-phase modeling: explicit geometric learning within the architecture, and invariant prediction supported by physics-informed descriptors.
Submission Track: Full Paper
Submission Category: Automated Synthesis
Submission Number: 22
Loading