Neural architecture search under black-box objectives with deep reinforcement learning and increasingly-sparse rewards

Abstract: In this paper, we address the problem of neural architecture search (NAS) in a context where the optimality policy is driven by a black-box Oracle <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathcal{O}$</tex> with unknown form and derivatives. In this scenario, <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathcal{O}(A_{C})$</tex> typically provides readings from a set of sensors on how a neural network architecture <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$A_{C}$</tex> fares in a target hardware, including its: power consumption, working temperature, <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{cpu}/\mathbf{gpu}$</tex> usage, central bus occupancy, and more. Current differentiable NAS approaches fail in this problem context due to lack of access to derivatives, whereas traditional reinforcement learning NAS approaches remain too expensive computationally. As solution, we propose a reinforcement learning NAS strategy based on policy gradient with increasingly sparse rewards. We rely on the fact [1] that one does not need to fully train the weights of two neural networks to compare them. Our solution starts by comparing architecture candidates with almost fixed weights and no training, and progressively shifts toward comparisons under full weights training. Experimental results confirmed both the accuracy and training efficiency of our solution, as well as its compliance with soft/hard constraints imposed on the sensors feedback. Our strategy allows finding near-optimal architectures significantly faster, in approximately 1/3 of the time it would take otherwise.
0 Replies
Loading