- Abstract: Deep neural networks have shown incredible performance for inference tasks in a variety of domains. Unfortunately, most current deep networks are enormous cloud-based structures that require significant storage space, which limits scaling of deep learning as a service (DLaaS) and use for on-device augmented intelligence. This paper finds algorithms that directly use lossless compressed representations of deep feedforward networks (with synaptic weights drawn from discrete sets), to perform inference without full decompression. The basic insight that allows less rate than naive approaches is the recognition that the bipartite graph layers of feedforward networks have a kind of permutation invariance to the labeling of nodes, in terms of inferential operation and that the inference operation depends locally on the edges directly connected to it. We also provide experimental results of our approach on the MNIST dataset.
- Keywords: lossless compression, succinct data structures, feedforward neural networks
- TL;DR: This paper finds algorithms that directly use lossless compressed representations of deep feedforward networks, to perform inference without full decompression.