OccFlowNet: Occupancy Estimation via Differentiable Rendering and Occupancy Flow

Published: 01 Jan 2025, Last Modified: 08 Oct 2025WACV 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Semantic occupancy has recently gained significant traction as a prominent 3D scene representation. However, most existing camera-based methods rely on large and costly datasets with fine-grained 3D voxel labels for training, which limits their practicality and scalability. Furthermore, approaches in this domain lack the modelling of scene dynamics. In this work we present a novel approach to occupancy estimation inspired by neural radiance field (NeRF) using supervision in 2D based on 3D labels provided by LiDAR, that offers a more natural way of supervision than voxel labels. In particular, we employ differentiable volumetric rendering to predict depth and semantic maps and train a 3D network based on supervision in 2D space only. To enhance geometric accuracy and increase the supervisory signal, we introduce temporal rendering of adjacent time steps. Additionally, we introduce occupancy flow as a mechanism to handle dynamic objects in the scene and ensure their temporal consistency. Through extensive experimentation we demonstrate that supervision in 2D with LiDAR can achieve state-of-the-art performance compared to methods using voxel labels, and when combining it with voxel supervision in 3D, temporal rendering and occupancy flow, we outperform all previous occupancy estimation models significantly. We conclude that the proposed rendering supervision and occupancy flow advances occupancy estimation.
Loading