ROSE: Reduced Overhead Stereo Event-Intensity Depth Estimation

26 Sept 2024 (modified: 13 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Event-based Vision, Stereo Depth Estimation
TL;DR: We propose an extremely efficient learning-based approach for unprecedented real-time depth estimation with event and intensity data.
Abstract: Stereo depth estimation using event cameras is a promising approach for real-time vision tasks, offering low-latency, high-speed data capture. However, existing methods often suffer from high computational overhead, limiting their real-time applicability. To address these challenges, we introduce ROSE (Reduced Overhead Stereo Event and Intensity) a Real-Time framework for efficient depth estimation from events and intensity images. Current approaches rely on dense networks that fail to scale with increasing data complexity, constraining both accuracy and speed. In contrast, ROSE incorporates lightweight event representation networks and optimizes the stereo matching process to reduce model size and computational load without compromising accuracy. We replace conventional network components with efficient spatio-temporal representations and streamline adaptive aggregation modules, reducing computational complexity by 1000× compared to previous methods. Furthermore, we adapt event grouping strategies to better align with intensity images, improving the quality of depth estimation under various lighting and motion conditions. Extensive experiments on the DSEC and MVSEC benchmarks demonstrate that ROSE achieves real-time performance, boosting frame rates to 32.2 FPS on DSEC and 66.9 FPS on MVSEC while maintaining competitive depth accuracy. This marks a significant improvement over prior work in terms of speed and scalability, making ROSE a viable solution for real-time stereo depth estimation in resource-constrained environments. Our code and models will be released to support further advancements in the field.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5776
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview