Abstract: Highlights • Two branches with heterogeneous transformations optimizes video deblurring network. • A compact extension enables depth-wise separable spatiotemporal convolution. • Capturing global information in dynamic scenes with nonlocal fusion refines features. • Adapting the double attention operation optimizes nonlocal fusion. • Network and layer decomposition combined enhances restoration with low computation. Abstract We present a bi-branch network for efficient dynamic scene deblurring. The challenge is to simultaneously reduce the computational cost and enhance the restoration accuracy. The proposed network conduct heterogeneous transformations on motion and RGB content in an encoder–decoder structure with skip connections. The computational efficiency is achieved by explicitly decomposing the intertwined mapping of spatiotemporal and cross-channel correlations into the motion branch that processes grayscale frames with our proposed pseudo depth-wise separable 3D convolution and the color branch that conducts depth-wise separable 2D convolution on RGB content. We refine features captured by the motion branch and the color branch by incorporating a lightweight nonlocal fusion layer that adapts the double attention operation to aggregate heterogeneous transformations and generate for each location in the feature space an output based on its correlation with the entire video clip. Our nonlocal fusion maintains low computational cost in processing high-resolution frames and operates in a patch-based manner during inference. The proposed architecture strikes the right balance between complexity and accuracy for dynamic scene deblurring. In comparison with state-of-the-art methods, the proposed network is compact and shows competitive restoration accuracy with a significant reduction in computational cost.
0 Replies
Loading