PolFormer: Event-Only Self-Supervision with Probabilistic Attention for Road Segmentation

ICLR 2026 Conference Submission19200 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Event camera, self-supervised, transformer, probabilistic attention, road segmentation
Abstract: Event cameras offer microsecond latency and exceptional dynamic range, making them a natural fit for road segmentation in autonomous driving. Yet their impact has been limited by the scarce annotations and the high cost of labeling event streams. Current solutions rely on transferring knowledge from RGB domains, but this dependence erases the very advantages that make event sensing unique. This work break the dependence on RGB with an event-native self-supervised transformer architecture that learns rich event-specific semantics directly from raw unlabeled event streams (no frames through a polarity-guided self-supervised pretext task. To further exploit the spatiotemporal richness of event data, we propose a probabilistic attention mechanism that outperforms standard dot-product attention on this modality. On DSEC-Semantic and DDD17, our approach achieves state-of-the-art road segmentation with orders of magnitude fewer labels. These results establish self-supervision as a scalable and label-efficient paradigm shift for event-driven vision in autonomous driving.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 19200
Loading