Uncertainty quantification in machine learning and nonlinear least squares regression models

Ni Zhan, John R. Kitchin

20 May 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: Machine learning (ML) models are valuable research tools for making accurate predictions. However, ML models often unreliably extrapolate outside their training data. The multiparameter delta method quantifies uncertainty for ML models (and generally for other nonlinear models) with parameters trained by least squares regression. The uncertainty measure requires the gradient of the model prediction and the Hessian of the loss function, both with respect to model parameters. Both the gradient and Hessian can be readily obtained from most ML software frameworks by automatic differentiation. We show examples of the uncertainty method in applications of molecular simulations and neural networks. We further show that the uncertainty measure is larger for input space regions that are not part of the training data. Therefore, this method can be used to identify extrapolation and to aid in selecting training data or assessing model reliability.

0 Replies