Abstract: Tensor decompositions are powerful tools for capturing the low-rank structure of dynamic videos. However, existing tensor decompositions primarily consider pixel-wise interactions, thus capturing solely global spatio-temporal correlations and struggling to handle the complex patterns that are inherent to dynamic videos in real-world applications. To overcome this limitation, we propose a dynamic Bhattacharya-Mesner (DyBM) decomposition, which represents the dynamic video as a sum of terms, with each term being the convolution of a BM-rank 1 tensor and a learnable three-dimensional filter. The newly constructed filters enable DyBM decomposition to establish patch-wise interactions in BM-rank 1 tensors, effectively capturing both global and local spatio-temporal correlations in dynamic videos. We further provide a physical interpretation of the factors in DyBM decomposition and offer an in-depth discussion of its relationship to the original BM decomposition. To evaluate the effectiveness of DyBM decomposition, we build a dynamic video recovery model. To solve the model, we develop a corresponding optimization algorithm with a theoretical convergence guarantee. Extensive experiments verify that DyBM decomposition-based method performs more favorably than the state-of-the-art tensor decomposition-based methods especially for dynamic videos.
External IDs:dblp:journals/tcsv/ZhengZZJL26
Loading