DeLiVoTr: Deep and light-weight voxel transformer for 3D object detection

Published: 01 Jan 2024, Last Modified: 16 Jul 2025Intell. Syst. Appl. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•DeLiVoTr maintains not only same stride but also receptive field to efficiently detect small size objects (pedestrians).•The DeLiVoTr attention block powers both intra- and inter-region voxel transformer to extract voxel local & global features.•Leveraging layer-level depth and width scaling we introduce three variants of our model (small, base and large).•Our method surpasses existing approaches on small size pedestrian class with an inference speed of 20 FPS.
Loading