# VideoTitans: Scalable Video Prediction with Integrated Short- and Long-term Memory

**In the example, the default epoch is 50. Please read our paper, and train 1000~2000 epochs for repruducing this work!** 

This repository contains the implementation code for paper:

**VideoTitans: Scalable Video Prediction with Integrated Short- and Long-term Memoryn**  

## Introduction

Accurate video forecasting enables autonomous vehicles to anticipate hazards, robotics and surveillance systems to predict human intent, and environmental models to issue timely warnings for extreme weather events. However, existing methods remain limited: transformers rely on global attention with quadratic complexity, making them impractical for high-resolution, long-horizon video prediction, while convolutional and recurrent networks suffer from short-range receptive fields and vanishing gradients, losing key information over extended sequences. To overcome these challenges, we introduce \emph{VideoTitans}, the first architecture to adapt the gradient-driven \textit{Titans} memory—originally designed for language modelling to video prediction. VideoTitans integrates three core ideas: (i) a sliding-window attention core that scales linearly with sequence length and spatial resolution, (ii) an episodic memory that dynamically retains only informative tokens based on a gradient-based \textit{surprise} signal, and (iii) a small set of persistent tokens encoding task-specific priors that stabilize training and enhance generalization. Extensive experiments on Moving-MNIST, Human3.6M, TrafficBJ and WeatherBench benchmarks show that VideoTitans consistently reduces computation (FLOPs) and achieves competitive visual fidelity compared to state-of-the-art recurrent, convolutional, and efficient-transformer methods. Comprehensive ablations confirm that each proposed component contributes significantly. Code, checkpoints, and demonstration videos will be publicly available to ensure reproducibility and promote further research.

## Dependencies
* torch
* scikit-image=0.16.2
* numpy
* argparse
* tqdm

## Overview

* `API/` contains dataloaders and metrics.
* `main.py` is the executable python file with possible arguments.
* `model.py` contains the VideoTitans model.
* `exp.py` is the core file for training, validating, and testing pipelines.

## Install

This project has provided an environment setting file of conda, users can easily reproduce the environment by the following commands:
```
  conda env create -f environment.yml
  conda activate Videotitan
```

### Moving MNIST dataset

```
  cd ./data/moving_mnist
  bash download_mmnist.sh
```
