Abstract: Integrating self-supervised learning (SSL) prior to supervised learning (SL) is a prevalent strategy for enhancing model performance, especially in scenarios with limited labeled data. Nonetheless, this approach inherently introduces a trade-off between computational efficiency and performance gains. Although SSL significantly improves representation learning, it necessitates an additional and often computationally expensive training phase, posing substantial overhead in resource-constrained environments. To mitigate these limitations, we propose MixTraining, a novel training framework designed to interleave multiple epochs of SSL and SL within a unified $\textit{mixtraining phase}$. This phase enables a seamless transition between self-supervised and supervised objectives, facilitating enhanced synergy and improved overall accuracy. Additionally, MixTraining consolidates shared computational steps, thereby reducing redundant computations and lowering overall training latency. Comprehensive experimental evaluations demonstrate that MixTraining provides a superior trade-off between computational efficiency and model performance compared to conventional training pipelines. Specifically, on the TinyImageNet dataset using the ViT-Tiny model, MixTraining achieves an absolute accuracy improvement of 8.81% (a relative gain of 18.89%) while concurrently accelerating training by 1.29$\times$.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yannis_Kalantidis2
Submission Number: 5808
Loading