Abstract: Interval Predictor Models (IPMs) are robust regression models which learn to represent percentiles of the data
distribution, without using probabilistic priors [3]. The training of IPMs can be stated as a chance constrained optimisation
program. Scenario Optimisation is a principled framework for solving such chance constrained optimisation programs in a
data-driven manner, which provides theoretical guarantees on the quality of solutions to such programs [1]. These guarantees
are available for both convex and non-convex models. Advantageously, imprecise data can be trivially accounted for in the
scenario framework, by using set inclusion constraints for the model. A simple IPM, which targets the 100-th data percentile
at training time, is trained in the Scenario Framework by solving argminp
{Ex(y¯p(x)−y
p
(x)) : ¯yp(xi) > yi > y
p
(xi)}, where
y¯p(xi) and y
p
(xi) are the upper and lower predicted bound of the IPM (which depend on model parameters p), and xi and
yi are inputs and outputs of (precise) training data points. Due to differences between training and test performance, the
guarantees of Scenario Optimisation theory provide bounds on which percentile of the data has actually been learnt.
At the moment there is little work on training non-convex IPMs in practice, e.g. Neural Networks, although shallow
networks have been trained using fmincon in MATLAB [2]. The present authors have successfully applied minibatch
stochastic gradient descent to novel loss functions in order to train complex IPMs in TENSORFLOW. The proposed loss
functions are modifications to the ‘vanilla’ Max-Error loss function, that is
L = max
i∈M
|yˆ(xi)−yi
|, (1)
which is minimised for the network weights with minibatch stochastic gradient descent, where M is a randomly selected
minibatch of data indices [4]. For a homoscedastic IPM, the model center-line is defined as yˆ(x) = y¯(x)−h = y(x) +h,
where h is the model width which is equal to the value of L at the end of training. Order statistics indicate that the
maximum of a selected batch of data will be a particular percentile of the full data distribution, and hence this loss function
allows the percentiles of the data distribution to be learnt. This poster will discuss the work in Sadeghi et al. [5], which
proposes modifications to Eqn. (1) for the case of uncertain data (spec
0 Replies
Loading