TL;DR: This paper proposes a novel concept and method for black-box training of neural networks by zeroth-order optimization.
Abstract: This paper proposes a novel concept of natural perturbations for black-box training of neural networks by zeroth-order optimization. When a neural network is implemented directly in hardware, training its parameters by backpropagation ends up with an inaccurate result due to the lack of detailed internal information. We instead employ zeroth-order optimization, where the sampling of parameter perturbations is of great importance. The sampling strategy we propose maximizes the entropy of perturbations with a regularization that the probability distribution conditioned by the neural network does not change drastically, by inheriting the concept of natural gradient. Experimental results show the superiority of our proposal on diverse datasets, tasks, and architectures.
Lay Summary: Neural networks are generally implemented on ordinary computers with CPUs or GPUs and memory, and are trained using the well-known backpropagation algorithm, which requires detailed internal information stored in memory. In contrast, there has been growing interest in developing methods that train neural networks without detailed internal information.
We employ a black-box optimization method, where we perturb neural network parameters slightly and observe how the training loss function changes. While most existing methods perturb each parameter independently, our method considers parameter correlations and perturbs them so that the neural network's output does not change drastically. We call such generated perturbations *natural perturbations*. The term *natural* has the same meaning as *natural gradient* used when detailed internal information is available.
The experimental results show that our method clearly outperforms existing methods. Our contribution accelerates research on emerging methods that train neural networks directly implemented on hardware or in memory-constrained environments.
Primary Area: Optimization->Zero-order and Black-box Optimization
Keywords: Zeroth-order optimization, Multivariate normal distribution, Covariance matrix, Fisher information matrix
Submission Number: 4193
Loading