Combining Raft-Based Stereo Disparity and Optical Flow Models For Scene Flow Estimation

Published: 2024, Last Modified: 28 Jan 2026ICIP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In stereo-based scene flow estimation, two problems are often addressed in tandem—optical flow and stereo depth estimation. Both of these problems require dense point matching within distinct search domains. Despite their similarities, more investigation has yet to be conducted on improving optical flow performance through stereo datasets and vice versa. This paper introduces a conjoined network for optical flow and stereo disparity estimation based on the Recurrent All-Pairs Field Transforms (RAFT) architecture. The network features a shared backbone, reducing parameters and facilitating joint training on stereo disparity and optical flow data. Our joint model surpasses the baseline RAFT and RAFT-Stereo models, demonstrating that the two dense matching tasks can be effectively addressed using the same encoded features. Experiments show that training on data for each task improves the model’s performance on the other task. The joint model offers the advantage of training with more data to improve the encoder.
Loading