Progressive Knowledge Distillation: Balancing Inference Latency and Accuracy at Runtime

Don Dennis; Abhishek Shetty; Anish Sevekari; Kazuhito Koishida; Virginia Smith

Progressive Knowledge Distillation: Balancing Inference Latency and Accuracy at Runtime

Don Dennis, Abhishek Shetty, Anish Sevekari, Kazuhito Koishida, Virginia Smith

Published: 20 Jun 2023, Last Modified: 16 Jul 2023ES-FoMO 2023 PosterEveryoneRevisionsBibTeX

Keywords: Efficient inference, on-device inference, resource efficient, run-time tradeoff, edge computing

Abstract: We study the problem of progressive distillation: Given a large, pretrained teacher model $g$, we seek to decompose the model into smaller, low-inference cost student models $f_i$, such that progressively evaluating additional models in this ensemble results in strict improvements over previous predictions. For user-facing inference applications, this allows us to flexibly trade accuracy for inference latency at runtime. We develop a boosting based algorithm, B-DISTIL, for progressive distillation, and demonstrate its effectiveness on standard datasets.

Submission Number: 30

Loading