Keywords: cnn, compression, efficient search, sets, embedded systems
TL;DR: convnet compression that is fast, not resource hungry and uses width modifiers applied with a new twist.
Abstract: We propose a new approach, based on discrete filter pruning, to adapt off-the-shelf models into an embedded
environment. Importantly, we circumvent the usually prohibitive costs of model compression. Our method, Structured
Coarse Block Pruning (SCBP), prunes whole CNN kernels using width modifiers applied to a novel transformation of
convlayers into superblocks. SCBP uses set representations to construct a rudimentary search to provide candidate
networks. To test our approach, the original ResNet architectures serve as the baseline and also provide the 'seeds'
for our candidate search. The search produces a configurable number of compressed (derived) models. These derived models
are often ~20\% faster and ~50\% smaller than their unmodified counterparts. At the expense of accuracy, the size can
become even smaller and the inference latency lowered even further. The unique SCBP transformations yield many new model
variants, each with their own trade-offs, and does not require GPU clusters or expert humans for training and design.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
4 Replies
Loading