Keywords: Event Camera, Neuromorphic Computing
Abstract: Dense semantic segmentation in dynamic environments is fundamentally limited by the low-frame-rate (LFR) nature of standard cameras, which creates critical perceptual gaps between frames.
To solve this, we introduce *Anytime Interframe Semantic Segmentation*: a new task for predicting segmentation at any arbitrary time using only a single past RGB frame and a stream of asynchronous event data.
This task presents a core challenge: how to robustly propagate dense semantic features using a motion field derived from sparse and often noisy event data, all while mitigating feature degradation in highly dynamic scenes.
We propose LiFR-Seg, a novel framework that directly addresses these challenges by propagating deep semantic features through time. The core of our method is an *uncertainty-aware warping process*, guided by an event-driven motion field and its learned, explicit confidence. A *temporal memory attention* module further ensures coherence in dynamic scenarios.
We validate our method on the DSEC dataset and a new high-frequency synthetic benchmark (SHF-DSEC) we contribute. Remarkably, our LFR system achieves performance (73.82\% mIoU on DSEC) that is statistically indistinguishable from an HFR upper-bound (within 0.09\%) that has full access to the target frame.
% We further demonstrate superior robustness in *highly dynamic* (M3ED-Drone \& Quadruped) and *low-light* (DSEC-Night) scenarios, where our method can even surpass the HFR baseline.
We further demonstrate superior robustness across extreme scenarios: in highly dynamic (M3ED) tests, our method closely matches the HFR baseline's performance, while in the low-light (DSEC-Night) evaluation, it even surpasses it.
This work presents a new, efficient paradigm for achieving robust, high-frame-rate perception with low-frame-rate hardware.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 8929
Loading