Abstract: In stereo-based scene flow estimation, two problems are often addressed in tandem—optical flow and stereo depth estimation. Both of these problems require dense point matching within distinct search domains. Despite their similarities, more investigation has yet to be conducted on improving optical flow performance through stereo datasets and vice versa. This paper introduces a conjoined network for optical flow and stereo disparity estimation based on the Recurrent All-Pairs Field Transforms (RAFT) architecture. The network features a shared backbone, reducing parameters and facilitating joint training on stereo disparity and optical flow data. Our joint model surpasses the baseline RAFT and RAFT-Stereo models, demonstrating that the two dense matching tasks can be effectively addressed using the same encoded features. Experiments show that training on data for each task improves the model’s performance on the other task. The joint model offers the advantage of training with more data to improve the encoder.
External IDs:dblp:conf/icip/Pan0AX24
Loading