Abstract: View Transformation Module (VTM), where transformations happen between multi-view image features and
Bird-Eye-View (BEV) representation, is a crucial step in
camera-based BEV perception systems. Currently, the two
most prominent VTM paradigms are forward projection and
backward projection. Forward projection, represented by
Lift-Splat-Shoot, leads to sparsely projected BEV features
without post-processing. Backward projection, with BEVFormer being an example, tends to generate false-positive
BEV features from incorrect projections due to the lack
of utilization on depth. To address the above limitations,
we propose a novel forward-backward view transformation
module. Our approach compensates for the deficiencies
in both existing methods, allowing them to enhance each
other to obtain higher quality BEV representations mutually. We instantiate the proposed module with FB-BEV,
which achieves a new state-of-the-art result of 62.4% NDS
on the nuScenes test set. Code and models are available at
https://github.com/NVlabs/FB-BEV.
0 Replies
Loading