Occfeat: Self-supervised occupancy feature prediction for pretraining bev segmentation networks

Sophia Sirko-Galouchenko

Published: 04 Oct 2024, Last Modified: 05 Mar 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: We introduce a self-supervised pretraining method called OccFeat for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However the geometry learned is class-agnostic. Hence we add semantic information to the model in the 3D space through distillation from a self-supervised pretrained image foundation model. Models pretrained with our method exhibit improved BEV semantic segmentation performance particularly in low-data scenarios. Moreover empirical results affirm the efficacy of integrating feature distillation with 3D occupancy prediction in our pretraining approach.