Abstract: Real-time object detection on resource-constrained edge devices presents a significant challenge in balancing performance and efficiency. This paper introduces a novel knowledge distillation framework designed to enhance the capabilities of lightweight student models for object detection tasks. Our approach, Multi-Scale Frequency-Aware Distillation (MSFAD), integrates three key components: multi-scale distillation, frequency domain mask distillation, and feature alignment distillation. Multi-scale distillation enables the student to learn feature representations at various levels of granularity. Frequency domain mask distillation improves the student's ability to focus on relevant regions. Feature alignment distillation facilitates the transfer of channel-wise knowledge from teacher to student. We combine these techniques with a traditional detection loss to form a comprehensive loss function, balanced by a hyperparameter $\alpha$. Experimental results across various scenarios demonstrate that MSFAD significantly improves detection accuracy while reducing computational and storage costs. Our approach clearly presents significant performance gains and faster inference speeds.
Submission Number: 147
Loading