Abstract: Online knowledge distillation (KD) has drawn increasing attention in recent years. However, little attention has been paid to the capacity gap that exists between the models in the online KD paradigm. In this work, we investigate the impact of the capacity gap and experimentally verified that a large capacity gap can have a detrimental effect on performance. Moreover, to address this issue, we propose Auxiliary Branch assisted Mutual Learning (ABML), which leverages auxiliary branches to mitigate performance deterioration caused by the capacity gap. Experimental results show that our ABML can significantly improve the performance of online KD even when a capacity gap is present. Our code is available at https://github.com/maorongwang/ABML.
Loading