DeLiVoTr: Deep and light-weight voxel transformer for 3D object detection

Gopi Krishna Erabati, Helder Araújo

Published: 2024, Last Modified: 16 Jul 2025Intell. Syst. Appl. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•DeLiVoTr maintains not only same stride but also receptive field to efficiently detect small size objects (pedestrians).•The DeLiVoTr attention block powers both intra- and inter-region voxel transformer to extract voxel local & global features.•Leveraging layer-level depth and width scaling we introduce three variants of our model (small, base and large).•Our method surpasses existing approaches on small size pedestrian class with an inference speed of 20 FPS.