Sparse agent transformer for unified voxel and image feature extraction and fusion

Published: 01 Jan 2024, Last Modified: 25 Oct 2024Inf. Fusion 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We propose a 3D multi-modal perception model called Sparse Agent Transformer (SAT).•Sparse agent attention summarizes and simplifies data, avoiding grouping and padding.•Agent-based fusion method accelerates the perception of 3D multi-modal data.•Extensive experiments demonstrate the superiority of SAT over other methods.
Loading