TL;DR: Zero-Shot Lidar Panoptic Scene Completion
Abstract: We propose CAL (Complete Anything in Lidar) for Lidar-based shape-completion in-the-wild. This is closely related to Lidar-based semantic/panoptic scene completion. However, contemporary methods can only complete and recognize objects from a closed vocabulary labeled in existing Lidar datasets. Different to that, our zero-shot approach leverages the temporal context from multi-modal sensor sequences to mine object shapes and semantic features of observed objects. These are then distilled into a Lidar-only instance-level completion and recognition model. Although we only mine partial shape completions, we find that our distilled model learns to infer full object shapes from multiple such partial observations across the dataset. We show that our model can be prompted on standard benchmarks for Semantic and Panoptic Scene Completion, localize objects as (amodal) 3D bounding boxes, and recognize objects beyond fixed class vocabularies.
Lay Summary: Advanced self-driving cars and robots often use Lidar sensors to perceive their 3D surroundings. However, Lidar sensors only capture surfaces that are in direct view, resulting in incomplete observations of scene structures and objects. For example, if the car passes a parked bus, the Lidar sensor may only capture the side of the bus facing the vehicle, missing a significant portion of its overall shape. This can lead to poor decisions, such as turning too sharply during a maneuver or misjudging the available parking space. Our method, called Complete Anything in Lidar (CAL), learns to complete the missing parts of any object from just a single Lidar scan. While traditional systems are often trained to recognize a fixed set of object types based on manually labeled data, we generate training data for CAL by using video and Lidar recordings of real-world scenes without any annotations. As a vehicle moves, it naturally sees the same object from different angles. We use these observations to reconstruct more complete object shapes and generate training examples without requiring any manual labels. Once trained, CAL can infer the full shape of objects such as vehicles, buildings, or trees from a single scan at test time. It can also complete and identify previously unseen objects, like trailers, delivery carts, or roadside equipment. By learning to complete objects from partial observations, CAL helps autonomous systems make safer and more informed decisions, even in complex urban environments with heavily occluded objects.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://research.nvidia.com/labs/dvl/projects/complete-anything-lidar/
Primary Area: Applications->Robotics
Keywords: Zero-shot segmentation; Lidar scene completion; Lidar perception
Submission Number: 1654
Loading