An ensemble of self-supervised teachers for minimal student model with auto-tuned hyperparameters via improved Bayesian optimization

Jaydeep Kishore, Snehasis Mukherjee

Published: 2024, Last Modified: 27 Feb 2026Prog. Artif. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Due to a growing demand for efficient deep learning models capable of both high performance and reduced costs in terms of computation, model compression, and autotuned hyperparameters optimization, this paper introduces an Ensemble of Self-Supervised Teacher models for obtaining a Minimal Student Model (ESTMS), focusing on efficient knowledge transfer through distillation. The proposed ESTMS transfers the knowledge learned by an ensemble of self-supervised teachers to a smaller student model. Further, the proposed method presents an improved Bayesian optimization technique for optimizing the hyperparameters of the student network. This study contributes insights into knowledge distillation by combining ensemble self-supervised teachers, minimal student models, and auto-tuned hyperparameters. The experiments of this study demonstrate the potential for more effective and efficient knowledge transfer techniques, to apply to target tasks. The experimental setup involves ensembling SimClr, SWAV, and Barlo Twins self-supervised models as teachers with minimal ResNet-18 and ResNet-34 as student models. The results of the experiments on benchmark datasets, including Cifar-10, Cifar-100, and Caltech-101, show that the proposed knowledge distillation method, alongwith the proposed hyperparameter tuning process enables a smaller student model to achieve state-of-the-art performances, with reduced parameters and GLFOPs. The entire code for this study will be released at https://github.com/jk452/ESTMS after acceptance.

External IDs:dblp:journals/pai/KishoreM24