Distilled Center and Scale Prediction: Distilling anchor-free pedestrian detector for edge computing

Jianyuan Wang; Liang She; Wei Wang; Xinyue Liu; Yangyan Zeng

Distilled Center and Scale Prediction: Distilling anchor-free pedestrian detector for edge computing

Jianyuan Wang, Liang She, Wei Wang, Xinyue Liu, Yangyan Zeng

Published: 01 Jan 2025, Last Modified: 13 May 2025Internet Things 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As an important task of the Internet of Things, pedestrian detection has attained good detection results powered by deep learning. However, common pedestrian detectors based on deep learning usually require lots of computing resources and consume large amounts of energy, making them not suitable for edge devices in the edge computing paradigm. Therefore, for the pedestrian detection task in edge computing, we consider knowledge distillation on the anchor-free detector Center and Scale Prediction to ensure the pedestrian detection performance while reducing the parameter number and inference time of the detector as much as possible. We propose a distillation framework Distilled Center and Scale Prediction, in which we implement feature-based and response-based distillation for transferring knowledge from the larger model to the smaller model. In order to transmit information useful for detection in distillation as much as possible, Multi-Reference Distillation is designed to filter the transferred knowledge. Moreover, Cross-Module Distillation is proposed to enhance the transfer of relational information during distillation. We perform related experiments on the CityPersons dataset. Our proposed distilled detector achieves 10.45% MR−2<math><mrow is="true"><mi is="true">M</mi><msup is="true"><mrow is="true"><mi is="true">R</mi></mrow><mrow is="true"><mo is="true">−</mo><mn is="true">2</mn></mrow></msup></mrow></math> with ResNet18 as backbone and 10.27% MR−2<math><mrow is="true"><mi is="true">M</mi><msup is="true"><mrow is="true"><mi is="true">R</mi></mrow><mrow is="true"><mo is="true">−</mo><mn is="true">2</mn></mrow></msup></mrow></math> with ResNet50 as backbone, even outperforming the original teacher detectors. At the same time, the inference time per image is reduced by more than 10% compared to the original teacher detector.

Loading