Unified Wisdom: Harnessing Collaborative Learning to Improve Efficacy of Knowledge Distillation

Atharva Abhijit Tambat; Durga S; Ganesh Ramakrishnan; Pradeep Shenoy

Unified Wisdom: Harnessing Collaborative Learning to Improve Efficacy of Knowledge Distillation

Atharva Abhijit Tambat, Durga S, Ganesh Ramakrishnan, Pradeep Shenoy

Published: 12 Aug 2025, Last Modified: 12 Aug 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Knowledge distillation (KD), which involves training a smaller student model to approximate the predictions of a larger teacher model is useful in striking a balance between model accuracy and computational constraints. However, KD has been found to be ineffective when the teacher and student models have a significant capacity gap. In this work, we address this issue via “meta-collaborative distillation” (MC-Distil), where students of varying capacities collaborate during distillation. Using a “coordinator” network (C-Net), MC-Distil enables mutual learning among students as a meta-learning task. Our insight is that C-Net learns from each student’s performance and training instance characteristics, allowing students of different capacities to improve together. Our method enhances student accuracy for all students, surpassing state-of-the-art baselines, including multi-step distillation, consensus enforcement, and teacher re-training. We achieve average gains of 2.5% on CIFAR-100 and 2% on Tiny ImageNet datasets, consistently across diverse student sizes, teacher sizes, and architectures. Notably, larger students benefiting through meta-collaboration with smaller students is a novel idea. MC-Distil excels in training superior student models under real-world conditions such as label noise and domain adaptation. Our approach also yields consistent improvements on the MS COCO object detection benchmark and introduces only a modest 5% computational overhead during training, with no additional cost at inference.

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/AtharvaTambat/MC-Distil

Assigned Action Editor: ~Hongsheng_Li3

Submission Number: 4706

Loading