How to distill task-agnostic representations from many teachers?

Philippe Formont; Maxime DARRIN; Banafsheh Karimian; Loïc Fosse; Eric Granger; Ismail Ben Ayed; Jackie CK Cheung; Mohammadhadi Shateri; Pablo Piantanida

How to distill task-agnostic representations from many teachers?

Philippe Formont, Maxime DARRIN, Banafsheh Karimian, Loïc Fosse, Eric Granger, Ismail Ben Ayed, Jackie CK Cheung, Mohammadhadi Shateri, Pablo Piantanida

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: knowledge distillation, representation learning, natural language processing, molecular modeling, computer vision, embedding models

TL;DR: We propose a grounded task-agnostic multi-teacher distillation procedure derived from a "majority vote" loss function, thus creating high quality unsupervised embedders.

Abstract: Casting complex inputs onto tractable representations is a critical step in many fields. Differences in architectures, loss functions, input modalities, and datasets lead to embedding models that capture diverse information of the input. Multi-teacher distillation seeks to exploit this diversity to create richer representations but often remains task-specific. We extend this framework by proposing a task-oriented setting that introduces an objective function based on the "majority vote" principle. We demonstrate that the mutual information between the student and the teachers is an upper bound for this function, providing a task-agnostic loss for our distillation procedure. An extensive evaluation is performed in different domains ---natural language processing, computer vision, and molecular modeling --- indicating that our method effectively leverages teacher diversity to produce more informative representations. Finally, we use our method to train and release new state-of-the-art embedders, enabling improved downstream performance in NLP and molecular modeling.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12387

Loading