Enriching Online Knowledge Distillation with Specialist Ensemble

Mincheol Park; Woojeong Kim; Junsik Bang; Won Woo Ro; Suhyun Kim

Enriching Online Knowledge Distillation with Specialist Ensemble

Mincheol Park, Woojeong Kim, Junsik Bang, Won Woo Ro, Suhyun Kim

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Online knowledge distillation, Label prior shift, Ensemble learning

TL;DR: Online knowledge distillation with an ensemble of specialized teachers that are explicitly estimated for each imbalanced label prior.

Abstract: Online Knowledge Distillation (KD) has an advantage over traditional KD works in that it removes the necessity for a pre-trained teacher. Indeed, an ensemble of small teachers has become typical guidance for a student's learning trajectory. Previous works emphasized diversity to create helpful ensemble knowledge and further argued that the size of diversity should be significant to prevent homogenization. This paper proposes a well-founded online KD framework with naturally derived specialists. In supervised learning, the parameters of a classifier are optimized by stochastic gradient descent based on a training dataset distribution. If the training dataset is shifted, the optimal point and corresponding parameters change accordingly, which is natural and explicit. We first introduce a label prior shift to induce evident diversity among the same teachers, which assigns a skewed label distribution to each teacher and simultaneously specializes them through importance sampling. Compared to previous works, our specialization achieves the highest level of diversity and maintains it throughout training. Second, we propose a new aggregation that uses post-compensation in specialist outputs and conventional model averaging. The aggregation empirically exhibits the advantage of ensemble calibration even if applied to previous diversity-eliciting methods. Finally, through extensive experiments, we demonstrate the efficacy of our framework on top-1 error rate, negative log-likelihood, and notably expected calibration error.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Supplementary Material: zip

13 Replies

Loading