StixelNExT++: Lightweight Monocular Scene Segmentation and Representation for Collective Perception

Published: 01 Jan 2025, Last Modified: 10 Nov 2025CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper presents StixelNExT++, a novel approach to scene representation for monocular perception systems. Building on the established Stixel representation, our method infers 3D Stixels and enhances object segmentation by clustering smaller 3D Stixel units. The approach achieves high compression of scene information while remaining adaptable to point cloud and bird's-eye-view representations. Our lightweight neural network, trained on automatically generated LiDAR-based ground truth, achieves real-time performance with computation times as low as 10 ms per frame. Experimental results on the Waymo dataset demonstrate competitive performance within a 30-meter range, highlighting the potential of StixelNExT++ for collective perception in autonomous systems.
Loading