Abstract: Current state-of-the-art flow methods are mostly based on dense all-pairs cost volumes. However, as image resolution increases, the computational and spatial complexity of constructing these cost volumes grows at a quartic rate, making these methods impractical for high-resolution images. In this paper, we propose a novel Hybrid Cost Volume for memory-efficient optical flow, named HCV. To construct HCV, we first propose a Top-k strategy to separate the 4D cost volume into two global 3D cost volumes. These volumes significantly reduce memory usage while retaining a substantial amount of matching information. We further introduce a local 4D cost volume with a local search space to supplement the local information for HCV. Based on HCV, we design a memory-efficient optical flow network, named HCVFlow. Compared to the recurrent flow methods based the all-pairs cost volumes, our HCVFlow significantly reduces memory consumption while ensuring high accuracy. We validate the effectiveness and efficiency of our method on the Sintel and KITTI datasets and real-world 4K (2160 × 3840) resolution images. Extensive experiments show that our HCVFlow has very low memory usage and outperforms other memory-efficient methods in terms of accuracy. The code is publicly available at https://github.com/gangweiX/HCVFlow.
Primary Subject Area: [Content] Media Interpretation
Relevance To Conference: In the era of exponential growth in multimedia content creation and consumption, there's a pressing need for algorithms that not only excel in performance but also in efficiency. My research introduces a low-memory optical flow algorithm, designed to address this need by enabling advanced video analysis and multimedia content interaction with minimal resource consumption. Optical flow algorithms are pivotal for interpreting dynamic scenes in applications ranging from video content analysis to augmented reality and interactive media. However, their widespread adoption is hampered by high computational and memory requirements.
My work significantly reduces the memory footprint of optical flow computations while maintaining high motion estimation accuracy. This breakthrough allows for the deployment of sophisticated video analysis techniques in resource-constrained environments, such as mobile devices and real-time systems. By offering a solution that bridges the gap between high performance and low resource consumption, my research aligns perfectly with the conference's theme on novel processing of media-related information for multimedia content interpretation. I believe my findings contribute a valuable perspective to the field, opening new avenues for efficient vision content interpretation.
Submission Number: 1360
Loading