TL;DR: Generalizing neural nets with hinge activation to abs-linear models
Keywords: Abs-linear/normal form, piecewise linearization, dynamic search trajectory, back propagation
Abstract: We consider predictor functions $f(w;x)$ in abs-linear form, a generalization of neural nets with hinge activation. To train them with respect to a given data set of feature-label pairs $(x,y)$ one has to minimize the average loss, which is a multi-piecewise linear or quadratic function of the weights, i.e. coefficients of the abs-linear form. We suggest to attack this nonsmooth global optimization problem via successive piecewise linearization, which allows the application of coordinate search, gradient based methods or mixed binary linear optimization. These alternative methods solve the sequence of abs-linear model problems with a proximal term as demonstrated in \cite{GrRoLOD}. More general predictor functions $f(w;x)$ that are given in abs-normal form can be successively abs-linearized with respect to the weights $w$, which can then be optimized in a nested iteration. In the talk we will present numerical validations and comparisons with standard methods like ADAM \cite{adam}, e.g. on the MNIST problem.
3 Replies
Loading