Scene, Class, Signal: Tri‑Level Adaptation for Synthetic‑to‑Real LiDAR Segmentation

Scene, Class, Signal: Tri‑Level Adaptation for Synthetic‑to‑Real LiDAR Segmentation

ICLR 2026 Conference Submission16387 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LiDAR Semantic Segmentation, Unsupervised Domain Adaptation, Synthetic-to-Real

Abstract: Synthetic LiDAR datasets offer a scalable alternative to costly real-world annotations, but still exhibit a significant domain gap when applied to real-world data. Previous unsupervised domain adaptation (UDA) methods mainly rely on general adaptation strategies, without directly addressing the LiDAR-specific factors causing this gap. In this work, we analyze the synthetic-to-real domain gap from a root-cause-driven perspective. We decompose the components of this gap into three distinct granularities: scene-level, class-level, and signal-level. At the scene-level, we address the point structure distortions caused by real-world sensor effects, such as motion blur and rolling shutter. At the class-level, we consider that the domain gap varies depending on the structural complexity and dynamicity of each object class. Finally, at the signal-level, we tackle the lack of direct, realistic semantic information that corresponds to the synthetic input. To address these respective problems, we propose the following three methods. At the scene-level, we introduce a style embedding that captures point structure distortions and serves as a domain cue for adversarial learning. We then extend this scene-level style embedding to the class-level to address the class-dependent domain gap. To address the signal-level problem, we propose an intensity-guided self-training scheme, which enables the model to learn realistic, implicit semantic information from synthetic inputs. On SynLiDAR→SemanticKITTI, our method achieves 44.7 mIoU, and on SynLiDAR→SemanticPOSS, it reaches 51.1 mIoU, setting a new state of the art on both benchmarks. Extensive ablation studies validate each component, confirming our style embedding captures the structural domain gap while our self-training scheme significantly improves adaptation.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 16387

Loading