Model Specialization for Inference Via End-to-End Distillation, Pruning, and Cascades

Daniel Kang; Karey Shi; Thao Ngyuen; Stephanie Mallard; Peter Bailis; Matei Zaharia

Model Specialization for Inference Via End-to-End Distillation, Pruning, and Cascades

Daniel Kang, Karey Shi, Thao Ngyuen, Stephanie Mallard, Peter Bailis, Matei Zaharia

15 Feb 2018 (modified: 10 Feb 2022)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: The availability of general-purpose reference and benchmark datasets such as ImageNet have spurred the development of general-purpose popular reference model architectures and pre-trained weights. However, in practice, neural net- works are often employed to perform specific, more restrictive tasks, that are narrower in scope and complexity. Thus, simply fine-tuning or transfer learn- ing from a general-purpose network inherits a large computational cost that may not be necessary for a given task. In this work, we investigate the potential for model specialization, or reducing a model’s computational footprint by leverag- ing task-specific knowledge, such as a restricted inference distribution. We study three methods for model specialization—1) task-aware distillation, 2) task-aware pruning, and 3) specialized model cascades—and evaluate their performance on a range of classification tasks. Moreover, for the first time, we investigate how these techniques complement one another, enabling up to 5× speedups with no loss in accuracy and 9.8× speedups while remaining within 2.5% of a highly ac- curate ResNet on specialized image classification tasks. These results suggest that simple and easy-to-implement specialization procedures may benefit a large num- ber practical applications in which the representational power of general-purpose networks need not be inherited.

Data: [Oxford 102 Flower](https://paperswithcode.com/dataset/oxford-102-flower), [YFCC100M](https://paperswithcode.com/dataset/yfcc100m)

4 Replies

Loading