TL;DR: We distilled ensemble distribution using flow matching
Abstract: Neural network ensembles have proven effective in improving performance across a range of tasks; however, their high computational cost limits their applicability in resource-constrained environments or for large models. Ensemble distillation, the process of transferring knowledge from an ensemble teacher to a smaller student model, offers a promising solution to this challenge. The key is to ensure that the student model is both cost-efficient and achieves performance comparable to the ensemble teacher. With this in mind, we propose a novel ensemble distribution distillation method, which leverages flow matching to effectively transfer the diversity from the ensemble teacher to the student model. Our extensive experiments demonstrate the effectiveness of our proposed method compared to existing ensemble distillation approaches.
Lay Summary: Ensembles, which combine multiple neural networks, improve the accuracy and reliability of predictions but are too costly for many real-world applications. Ensemble distillation addresses this by training a smaller student model to imitate a larger teacher ensemble, effectively distilling the knowledge from teacher to student. A key challenge is preserving the diversity of predictions, one of the ensemble’s main strengths. We propose a method that helps the student learn not just the average output but also the variation in the ensemble’s answers. This is achieved using a technique called flow matching, which captures richer information. Our method outperforms existing ensemble distillation approaches and enables efficient, high-quality models for use in resource-limited environments.
Link To Code: https://github.com/Park-Jong-Geon/EDFM
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Ensemble Distillation, Flow Matching
Submission Number: 15544
Loading