Revisiting Learning-based Video Motion Magnification for Real-time Processing

TMLR Paper7334 Authors

04 Feb 2026 (modified: 15 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye. The deep learning-based prior work successfully models outstanding quality better than conventional signal processing-based ones. However, it still lags behind real-time performance, which prevents it from being extended to various online systems. In this paper, we revisit the first learning-based model and present experimental analyses, in particular on the identification of redundant components, the insertion of spatial bottlenecks, and the trade-off relationship between channel reduction and layer addition. By integrating the findings of each experiment, we present a real-time, deep learning-based motion magnification model that achieves a computational speed ranging from a minimum of 2.7 times to a maximum of 34.9 times faster than existing learning-based methods, while maintaining perceptually sufficient generation quality. To the best of our knowledge, this is the first learning-based motion magnification model that runs in real-time on Full-HD resolution videos even without ad hoc quantization.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We revised the manuscript to address the reviewers’ comments by clarifying claims, improving presentation, and adding new analyses and experiments in the main paper and appendices. Please note that all changes are **marked in blue** within the revised manuscript. - Reframed the generation-quality claim from “comparable” to “perceptually sufficient” in the Abstract and Sections 1 and 7. - Clarified that the perceptual loss is a training modification rather than an architectural component, and revised Section 5.4 and Figure 5 accordingly. - Improved presentation throughout the paper, including grammar and sentence flow, and added/clarified visual guidance in Figure 1, Figure 4, and Section 4.2. - Added clarification on the interpretation of the magnification factor and fairness of comparison in Section 6.4. - Expanded the discussion on learned spatial representations in Appendix J. - Added further discussion and supporting analysis for the 4$\times$ spatial reduction design choice in Section 5.2 and Appendix G. - Added the training-time overhead of perceptual loss in Section 5.4. - Clarified terminology and interpretation, including FIR in Section 6.2 and Figure A6, and the 0.01-pixel discussion in Sections 4.3 and 6.3. - Added new results on generalization in Appendix A, temporal stability in Appendix B, runtime profiling and speed--resolution trade-offs in Appendix I, and challenging failure cases in Appendix J, with additional supporting figures.
Assigned Action Editor: ~Adam_W_Harley1
Submission Number: 7334
Loading