TL;DR: an efficient analog of the universal approximation theorem for neural networks over the boolean hypercube
Abstract: The universal approximation theorem for neural networks says that any reasonable function is well-approximated by a two-layer neural network with sigmoid gates but it does not provide good bounds on the number of hidden-layer nodes or the weights. However, robust concepts often have small neural networks in practice. We show an efficient analog of the universal approximation theorem on the boolean hypercube in this context. We prove that any noise-stable boolean function on n boolean-valued input variables can be well-approximated by a two-layer linear threshold circuit with a small number of hidden-layer nodes and small weights, that depend only on the noise-stability and approximation parameters, and are independent of n. We also give a polynomial time learning algorithm that outputs a small two-layer linear threshold circuit that approximates such a given function. We also show weaker generalizations of this to noise-stable polynomial threshold functions and noise-stable boolean functions in general.
Conflicts: microsoft.com, cs.utexas.edu