CACD-SEG: Contrastive Alignment Consistent Distillation for All Day Semantic Segmentation

ICLR 2026 Conference Submission695 Authors

02 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: semantic segmentation, multi-modal, dataset
TL;DR: event-frame method to tackle all-day semantic segmentation
Abstract: Existing semantic segmentation methods based on traditional frame cameras often encounter issues in complex lighting scenes, such as low-light nighttime or overexposed scenes, and boundary ambiguities caused by motion blur in high-speed scenarios. Event cameras, with their high dynamic range and high temporal resolution, can effectively alleviate these issues and have consequently attracted increasing attention. However, most existing event-based semantic segmentation methods employ straightforward concatenation feature fusion, overlooking the heterogeneity of features between the two modalities. To address these issues, we propose an event-frame alignment-distillation semantic segmentation method. Specifically, we design a heterogeneous feature contrastive alignment module that projects both modalities into a common space to bridge the representation gap. Furthermore, we present a joint boundary-content knowledge distillation module to transfer the clear region and edge information captured by event camera to frame domain, effectively enhancing the robustness of segmentation results. Besides, we construct the first real-world pixel-aligned event-frame semantic segmentation dataset to enable comprehensive training and evaluation, which will be publicly available online. Extensive experiments demonstrate the effectiveness of our method.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 695
Loading