Designing Autotuned Student Models by Knowledge Distillation from Self-Supervised Teacher Models

Jaydeep Kishore, Snehasis Mukherjee

Published: 2025, Last Modified: 27 Feb 2026SN Comput. Sci. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As deep learning-based models continue to be developed, optimizing model accuracy and efficiency remains a challenging task. This paper applies a knowledge distillation-based approach to train a student model from a pretrained self-supervised teacher model for application to image classification. The study further describes how hyperparameters such as the number of layers, number of units, dropout, and learning rate for the multi layered perceptron (MLP) classifier following the student model are automatically tuned (autotuned), along with the hyperparameters used in knowledge distillation (balancing factor alpha and beta, and the temperature) using an efficient combination of Gaussian process and tree-structured Parzen estimator (TPE) algorithms. Additionally, the paper uses a linear combination of hard target, soft target, Barlow-Twins loss and cosine similarity loss functions. The experiments were conducted on three benchmark datasets: CIFAR-10, CIFAR-100, and Caltech-101, and the results are promising compared to the state-of-the-art self-supervised models. The source code for this study is publicly available at github.
Loading