Abstract: Machine learning models have recently shown promise in predicting molecular quantum chemical properties. However, the path to real-life adoption requires (1) learning under low-resource constraint and (2) out-of-distribution generalization to unseen, structurally diverse molecules. We observe that these two challenges can be alleviated via abundant labels, which are often not the case in quantum chemistry. We hypothesize that pseudo-labeling on vast array of unlabeled molecules can serve as gold-label proxies to greatly expand the training labeled dataset. The challenge in pseudo-labeling is to prevent the bad pseudo-labels from biasing the model. We develop a simple and effective strategy Pseudo that can assign pseudo-labels, detect bad pseud-labels through evidential uncertainty, and then prevent them from biasing the model using adaptive weighting. Empirically, Pseudo improves quantum calculations accuracy across full data, low data and out-of-distribution settings.
Supplementary Material: zip
11 Replies
Loading