DV-PredNet: Biologically Plausible Video Next Frame Prediction with Higher-level Semantics

Published: 02 Oct 2025, Last Modified: 10 Oct 2025RIWM Non ArchivalEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Computer Vision, Machine Learning, Predictive Coding, Deep Learning, Intuitive Physics
Abstract: This paper investigates biologically plausible video next-frame prediction in the domain of high-frequency physical interactions. We explore the limitations of PredNet, a deep network implementing predictive coding, on a custom dataset designed to isolate the spatiotemporal behaviors of dynamic objects. To address these limitations, we introduce DV-PredNet (Dorsal+Ventral PredNet), a disentangled, two-stream architecture to separately model physical dynamics ('where') and visual appearance ('what'). Our model demonstrates improvements in both visual fidelity and trajectory tracking. However, we identify a characteristic performance degradation during high-impact events, such as collisions. Here, the model prioritizes learned visual statistics over enforcing physical consistency, resulting in a persistent one-frame lag. This reactive behavior reveals a fundamental limitation of the predictive coding framework with purely implicit physics learning, pointing towards the need for stronger physical priors or hybrid architectures to achieve physically reliable dynamics.
Submission Number: 16
Loading