Abstract: Machine learning models have recently shown promise in predicting molecular quantum chemical properties. However, the path to real-life adoption requires (1) learning under low-resource constraint and (2) out-of-distribution generalization to unseen, structurally diverse molecules. We observe that these two challenges originate from label scarcity issue. We hypothesize that pseudo-labeling on vast array of unlabeled molecules can serve as proxies as gold-label to greatly expand the training labeled data. The challenge in pseudo-labeling is to prevent the bad pseudo-labels from biasing the model. We develop a simple and effective strategy Pseudo-Sigma that can assign pseudo-labels, detect bad pseud-labels through evidential uncertainty, and then prevent them from biasing the model using adaptive weighting. Empirically, Pseudo-Sigma improves quantum calculations accuracy across full data, low data and out-of-distribution settings.
Track: Original Research Track