GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation

Published: 2025, Last Modified: 29 Jan 2026WACV 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The Bird's-eye View (BeV) representation is widely used for 3D perception from multi-view camera images. It allows to merge features from different cameras into a common space, providing a unified representation of the 3D scene. The key component is the view transformer, which transforms image views into the BeV. However, actual view transformer methods based on geometry or cross-attention do not provide a sufficiently detailed representation of the scene, as they use a sub-sampling of the 3D space that is non-optimal for modeling the fine structures of the environment. In this paper, we propose GaussianBeV, a novel method for transforming image features to BeV by finely representing the scene using a set of 3D gaussians located and oriented in 3D space. This representation is then splat-tered to produce the BeV feature map by adapting recent advances in 3D representation rendering based on gaussian splatting [12]. GaussianBeV is the first approach to use this 3D gaussian modeling and 3D scene rendering process in an optimization free manner, i. e. without optimizing it on a specific scene and directly integrated into a single stage model for BeV scene understanding. Experiments show that the proposed representation is highly effective and place Gaussian BeV as the new state-of-the-art on the BeV semantic segmentation task on the nuScenes dataset [2].
Loading