Knowledge Distillation with Global Filters for Efficient Human Pose Estimation

Published: 28 Nov 2024, Last Modified: 04 Mar 2025BMVCEveryoneCC BY 4.0
Abstract: Efficient and accurate 2D human pose estimation (2D-HPE) remains a critical challenge that must be overcome to enable its use on resource-constrained devices. This paper introduces a novel framework that synergizes knowledge distillation with Global Filter Layers (GFL) to enable efficient and scalable 2D human pose estimation. Our approach leverages the power of a high-capacity heatmap network to train a lightweight student network. This student network employs global spectral filters as an alternative to attention-based token mixer enabling lower computational complexity and higher throughput. We specifically propose this approach on coordinate classification and regression-based 2D-HPE methods owing to their higher speed compared to heatmap models. We extensively evaluate our approach on MPII dataset with both regression and coordinate classification student networks and different filter weighting strategies. While our model is lightweight, it achieves about 18% increase in throughput speed and with 89.40PCKh@0.5 accuracy closes the performance gap with large state-of-the-art 2D-HPE models.
Loading