Modeling non-uniform uncertainty in Reaction Prediction via Boosting and Dropout

21 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: AI for Chemistry, Deep Generative Models, Chemical Reaction
Abstract: Reaction Prediction has been commonly recognized as a critical task in synthetic chemistry. Some recent methods propose modeling reaction prediction in a non-autoregressive way to achieve efficient parallel decoding. All previous non-autoregressive reaction prediction methods apply conditional VAE (CVAE) to model uncertainty which have two potential assumptions: 1. The prior is independent of the reactants so each reactant by default has a wide range of product distribution 2. Similar reactants have similar product distribution. However, we investigate that this assumption is not matched to the reaction prediction task which has the Non-uniform uncertainty phenomenon. The Non-uniform uncertainty means the level of potential product uncertainty is different for different reactants. For similar reactants, the potential product distribution is not uniform and the products may be not similar. Directly applying CVAE for the reaction prediction task leads to uniform product distribution for all reactants and enforces the model to predict similar products for similar reactants, thus impairing the model performance. To address this issue, we devise a non-uniform uncertainty reaction product generation framework. We first remove the latent variable in the previous CVAE model to reduce the uncontrollable noise. To introduce randomness to product generation, we apply boosting training which can obtain large-difference models, and dropout which can obtain small-different models to cover precise and diverse uncertainty. We also designed a simple ranking method to combine the predicted products of boosting and dropout to move the most possible products to the front. Experimental results on the largest reaction prediction benchmark USPTO-MIT show the superior performance of our proposed method in modeling the non-uniform uncertainty compared to baselines.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4229
Loading