Deep-ICE: The first globally optimal algorithm for empirical risk minimization of two-layer maxout and ReLU networks
Keywords: Neural network, Global optimal, Algorithm design, Combinatorial optimization
TL;DR: The first globally optimal algorithm for empirical risk minimization of two-layer maxout and ReLU networks
Abstract: This paper introduces the first globally optimal algorithm for the
empirical risk minimization problem of two-layer maxout and ReLU networks,
i.e., minimizing the number of misclassifications. The algorithm has
a worst-case time complexity of $O\left(N^{DK+1}\right)$, where $K$
denotes the number of hidden neurons and $D$ represents the number
of features. It can be can be generalized to accommodate arbitrary
computable loss functions without affecting its computational complexity.
Our experiments demonstrate that the proposed algorithm provides provably
exact solutions for small-scale datasets. To handle larger datasets,
we introduce a heuristic method that reduces the data size to a manageable
scale, making it feasible for our algorithm. This extension enables
efficient processing of large-scale datasets and achieves significantly
improved performance in both training and prediction, compared to state-of-the-art approaches
(neural networks trained using gradient descent and support vector
machines), when applied to the same models (two-layer networks with
fixed hidden nodes and linear models).
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 12393
Loading