Abstract: Highlights•DeLiVoTr maintains not only same stride but also receptive field to efficiently detect small size objects (pedestrians).•The DeLiVoTr attention block powers both intra- and inter-region voxel transformer to extract voxel local & global features.•Leveraging layer-level depth and width scaling we introduce three variants of our model (small, base and large).•Our method surpasses existing approaches on small size pedestrian class with an inference speed of 20 FPS.
Loading