Bridging the Gap Between Cross-Domain Theory and Practical Application: A Case Study on Molecular Dissolution
Keywords: Artificial intelligence in chemical research, gain-based core subset selection, asymmetric molecular interaction graph neural networks
Abstract: Artificial intelligence (AI) has played a transformative role in chemical research, greatly facilitating the prediction of small molecule properties, simulation of catalytic processes, and material design. These advances are driven by increases in computing power, open source machine learning frameworks, and extensive chemical datasets. However, a persistent challenge is the limited amount of high-quality real-world data, while models calculated based on large amounts of theoretical data are often costly and difficult to deploy, which hinders the applicability of AI models in real-world scenarios. In this study, we enhance the prediction of solute-solvent properties by proposing a novel sample selection method: the iterative core subset extraction (CSIE) framework. CSIE iteratively updates the core sample subset based on information gain to remove redundant features in theoretical data and optimize the performance of the model on real chemical datasets. Furthermore, we introduce an asymmetric molecular interaction graph neural network (AMGNN) that combines positional information and bidirectional edge connections to simulate real-world chemical reaction scenarios to better capture solute-solvent interactions. Experimental results show that our method can accurately extract the core subset and improve the prediction accuracy.
Supplementary Material: zip
Primary Area: Machine learning for sciences (e.g. climate, health, life sciences, physics, social sciences)
Submission Number: 7538
Loading