Localized Semantic Feature Mixers for Efficient Pedestrian Detection in Autonomous Driving

Abdul Hannan Khan, Mohammed Shariq Nawaz, Andreas Dengel

Published: 29 Jun 2023, Last Modified: 05 Mar 2025CVPR 2023EveryoneCC BY 4.0

Abstract: Autonomous driving systems rely heavily on the underly- ing perception module which needs to be both performant and efficient to allow precise decisions in real-time. Avoid- ing collisions with pedestrians is of topmost priority in any autonomous driving system. Therefore, pedestrian detec- tion is one of the core parts of such systems’ perception modules. Current state-of-the-art pedestrian detectors have two major issues. Firstly, they have long inference times which affect the efficiency of the whole perception module, and secondly, their performance in the case of small and heavily occluded pedestrians is poor. We propose Local- ized Semantic Feature Mixers (LSFM), a novel, anchor-free pedestrian detection architecture. It uses our novel Super Pixel Pyramid Pooling module instead of the, computation- ally costly, Feature Pyramid Networks for feature encod- ing. Moreover, our MLPMixer-based Dense Focal Detec- tion Network is used as a light detection head, reducing computational effort and inference time compared to ex- isting approaches. To boost the performance of the pro- posed architecture, we adapt and use mixup augmentation which improves the performance, especially in small and heavily occluded cases. We benchmark LSFM against the state-of-the-art on well-established traffic scene pedestrian datasets. The proposed LSFM achieves state-of-the-art per- formance in Caltech, City Persons, Euro City Persons, and TJU-Traffic-Pedestrian datasets while reducing the infer- ence time on average by 55%. Further, LSFM beats the human baseline for the first time in the history of pedestrian detection. Finally, we conducted a cross-dataset evaluation which proved that our proposed LSFM generalizes well to unseen data.