SWIFT: Efficient Warping-Only Optical Flow via Scale-Specialized Refinement
Abstract: Optical flow estimation is a fundamental component in many vision systems, including video understanding, autonomous navigation, robotics, and augmented reality. Recent optical flow methods have explored warping-based architectures that eliminate cost volumes to reduce memory consumption. While effective, existing warping-only approaches typically rely on dense, high-resolution iterative refinement and strong feature representations, leading to high inference latency and limited scalability on embedded hardware.
In this work, we present SWIFT (Scale-Warped Iterative Flow Tracking), an efficient warping-only optical flow framework designed for practical deployment. Instead of iteratively refining flow from a zero initialization at a fixed resolution, SWIFT progressively refines a single flow hypothesis across scales. An initial estimate is first computed at 1/16 resolution to capture large displacements at low computational cost. The flow is then refined through feature warping: a lightweight Transformer operates at 1/16 and 1/8 resolution to incorporate global context, followed by efficient convolutional refinement at 1/4 resolution for accurate boundary recovery. This scale-specialized design significantly reduces refinement cost while decreasing reliance on heavy feature encoders. We evaluate SWIFT on standard optical flow benchmarks with a focus on runtime efficiency. Experimental results demonstrate that SWIFT achieves a favorable speed–accuracy trade-off compared with prior warping-based methods. On NVIDIA AGX Orin, SWIFT runs over 6× faster than WAFT while maintaining competitive accuracy. These results suggest that scale-specialized warping and refinement provide an effective direction for efficient warping-only optical flow, particularly for real-time and embedded applications.
Loading