Knowledge Distillation via Flow Matching

Shitong Shao; Zhiqiang Shen; Linrui Gong; Huanran Chen

Knowledge Distillation via Flow Matching

Shitong Shao, Zhiqiang Shen, Linrui Gong, Huanran Chen

15 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Knowledge Transfer, Offline Knowledge Distillation, Online Knowledge Distillation, Ensemble, Flow-based Model

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We propose a novel and highly scalable knowledge transfer framework that introduces Rectified flow into knowledge distillation and relies on multi-step sampling strategies to achieve precision flow matching.

Abstract: In this paper, we propose a novel knowledge transfer framework that introduces Rectified flow into knowledge distillation and leverages multi-step sampling strategies to achieve precision flow matching. We name this framework Knowledge Distillation via Flow Matching (FM-KD), which can be integrated with a metric-based distillation method with any form (\textit{e.g.} vanilla KD, DKD, PKD and DIST), a meta-encoder with any available architecture (\textit{e.g.} CNN, MLP and Swin-Transformer), and achieves significant accuracy improvement for the student. We theoretically demonstrate that the training objective of FM-KD is equivalent to minimizing the upper bound of the teacher feature map's or logit's negative log-likelihood. Besides, FM-KD can be viewed as a unique implicit ensemble method that leads to performance gains. By slightly modifying the FM-KD framework, FM-KD can also be transformed into an online distillation framework OFM-KD with desirable performance gains. Through extensive experiments on CIFAR-100, ImageNet-1k, and MS-COCO datasets, we empirically validate the scalability and state-of-the-art performance of our proposed methods among relevant comparison approaches.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 120

Loading