FBSVP: Video Prediction Based on Foreground-Background Separation

zhu hong chang; WangDanDan; Faming Fang

FBSVP: Video Prediction Based on Foreground-Background Separation

zhu hong chang, WangDanDan, Faming Fang

21 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Video Prediction, Foreground-Background Separation

Abstract: Video prediction is the process of learning necessary information from historical frames to predict future video frames. How to focus and efficiently learn features from historical frames is a critical step in this process. For any sequence of video frames, the background changes little or remains almost constant, while the foreground changes significantly and is the main focus of our video prediction learning. However, current known video prediction learning methods do not consider how to utilize the different characteristics of the foreground and background to further improve prediction accuracy. To fully leverage the different characteristics of the foreground and background and enhance prediction accuracy, we propose a Foreground-Background Separation Video Prediction (FBSVP) model in this paper. Through the foreground and background separation module, historical video frames are separated into foreground and background frames. In the video prediction module, the foreground and background frames are predicted and learned separately. First, the features of historical frames are fused into the current frame through a historical attention fusion module using an attention mechanism. Then, the complementary temporal and spatial features are fused through a spatio-temporal fusion module. Finally, the learned foreground and background features are fused in the foreground and background fusion module to predict the final video frame. Experimental results show that our proposed FBSVP model achieves the best performance on popular video prediction datasets, demonstrating its significant competitiveness in this field.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2387

Loading