Abstract: Quantization-aware training (QAT) achieves competitive performance and is widely used for image classification tasks in model compression. Existing QAT works start with a pre-trained full-precision model and perform quantization during retraining. However, these works require supervision from the ground-truth labels whereas sufficient labeled data are infeasible in real-world environments. Also, they suffer from accuracy loss due to reduced precision, and no algorithm consistently achieves the best or the worst performance on every model architecture. To address the aforementioned limitations, this paper proposes a novel Self-Supervised Quantization-Aware Knowledge Distillation framework (SQAKD). SQAKD unifies the forward and backward dynamics of various quantization functions, making it flexible for incorporating the various QAT works. With the full-precision model as the teacher and the low-bit model as the student, SQAKD reframes QAT as a co-optimization problem that simultaneously minimizes the KL-Loss (i.e., the Kullback-Leibler divergence loss between the teacher's and student's penultimate outputs) and the discretization error (i.e., the difference between the full-precision weights/activations and their quantized counterparts). This optimization is achieved in a self-supervised manner without labeled data. The evaluation shows that SQAKD significantly improves the performance of various state-of-the-art QAT works (e.g., PACT, LSQ, DoReFa, and EWGS). SQAKD establishes stronger baselines and does not require extensive labeled training data, potentially making state-of-the-art QAT research more accessible.
Loading