Interval Neural Networks for Imprecise Training Data

Jonathan Sadeghi, Marco De Angelis, Edoardo Patelli

03 Nov 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: Interval Predictor Models (IPMs) are robust regression models which learn to represent percentiles of the data distribution, without using probabilistic priors [3]. The training of IPMs can be stated as a chance constrained optimisation program. Scenario Optimisation is a principled framework for solving such chance constrained optimisation programs in a data-driven manner, which provides theoretical guarantees on the quality of solutions to such programs [1]. These guarantees are available for both convex and non-convex models. Advantageously, imprecise data can be trivially accounted for in the scenario framework, by using set inclusion constraints for the model. A simple IPM, which targets the 100-th data percentile at training time, is trained in the Scenario Framework by solving argminp {Ex(y¯p(x)−y p (x)) : ¯yp(xi) > yi > y p (xi)}, where y¯p(xi) and y p (xi) are the upper and lower predicted bound of the IPM (which depend on model parameters p), and xi and yi are inputs and outputs of (precise) training data points. Due to differences between training and test performance, the guarantees of Scenario Optimisation theory provide bounds on which percentile of the data has actually been learnt. At the moment there is little work on training non-convex IPMs in practice, e.g. Neural Networks, although shallow networks have been trained using fmincon in MATLAB [2]. The present authors have successfully applied minibatch stochastic gradient descent to novel loss functions in order to train complex IPMs in TENSORFLOW. The proposed loss functions are modifications to the ‘vanilla’ Max-Error loss function, that is L = max i∈M |yˆ(xi)−yi |, (1) which is minimised for the network weights with minibatch stochastic gradient descent, where M is a randomly selected minibatch of data indices [4]. For a homoscedastic IPM, the model center-line is defined as yˆ(x) = y¯(x)−h = y(x) +h, where h is the model width which is equal to the value of L at the end of training. Order statistics indicate that the maximum of a selected batch of data will be a particular percentile of the full data distribution, and hence this loss function allows the percentiles of the data distribution to be learnt. This poster will discuss the work in Sadeghi et al. [5], which proposes modifications to Eqn. (1) for the case of uncertain data (spec

0 Replies