OTOcc: Optimal Transport for Occupancy Prediction
Abstract: The autonomous driving community is highly interested in 3D occupancy prediction due to its outstanding geometric perception and object recognition capabilities. However, previous methods
are limited to existing semantic conversion mechanisms for solving sparse ground truths problem,
causing excessive computational demands and suboptimal voxels representation. To tackle the above
limitations, we propose OTOcc, a novel 3D occupancy prediction framework that models semantic
conversion from 2D pixels to 3D voxels as Optimal
Transport (OT) problem, offering accurate semantic mapping to adapt to sparse scenarios without attention or depth estimation. Specifically, the unit
transportation cost between each demander (voxel)
and supplier (pixel) pair is defined as the weighted
occupancy prediction loss. Then, we utilize the
Sinkhorn-Knopp Iteration to find the best mapping matrices with minimal transportation costs.
To reduce the computational cost, we propose a
block reading technique with multi-perspective feature representation, which also brings fine-grained
scene understanding. Extensive experiments show
that OTOcc not only has the competitive prediction
performance but also has about more than 4.58%
reduction in computational overhead compared to
state-of-the-art methods.
Loading