- Abstract: Neural networks offer high-accuracy solutions to a range of problems, but are computationally costly to run in production systems. We propose a technique called Deep Learning Approximation to take an already-trained neural network model and build a faster (and almost equally accurate) network by manipulating the network structure and coefficients without requiring re-training or access to the training data. Speedup is achieved by applying a sequential series of independent optimizations that reduce the floating-point operations (FLOPs) required to perform a forward pass. An optimal lossy approximation is chosen for each layer by weighing the relative accuracy loss and FLOP reduction. On PASCAL VOC 2007 with the YOLO network, we show an end-to-end 2x speedup in a network forward pass with a $5$\% drop in mAP that can be re-gained by finetuning, enabling this network (and others like it) to be deployed in compute-constrained systems.
- Keywords: neural network speedup, low-rank approximation
- TL;DR: Decompose weights to use fewer FLOPs with SVD