Abstract: Highlights•We propose a 3D multi-modal perception model called Sparse Agent Transformer (SAT).•Sparse agent attention summarizes and simplifies data, avoiding grouping and padding.•Agent-based fusion method accelerates the perception of 3D multi-modal data.•Extensive experiments demonstrate the superiority of SAT over other methods.
Loading