# Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity 
<p align="center">
<img src=assets/first_fig.png />
</p>

## News
- May. 18, 2023. Project release.

## MinD-Vis
**MinD-Video** is a framework for video reconstruction from brain recording. 

## Abstract
Reconstructing human vision from brain activities has been an appealing task that helps to understand our cognitive process. Even though recent research has seen great success in reconstructing static images from non-invasive brain recordings, work on recovering continuous visual experiences in the form of videos is limited.
In this work, we propose MinD-Video that learns spatiotemporal information from continuous fMRI data of the cerebral cortex
progressively through masked brain modeling, multimodal contrastive learning with spatiotemporal attention, and co-training with an augmented Stable Diffusion model that incorporates network temporal inflation. 
We show that high-quality videos of arbitrary frame rates can be reconstructed with MinD-Video using adversarial guidance. The recovered videos were evaluated with various semantic and pixel-level metrics. We achieved an average accuracy of 85% in semantic classification tasks and 0.19 in structural similarity index (SSIM), outperforming the previous state-of-the-art by 45%. We also show that our model is biologically plausible and interpretable, reflecting established physiological processes.

## Overview
![flowchar-img](assets/flowchart.png)

## Samples
Some samples are shown below. More samples can be found in the [supplementary material](https://drive.google.com/drive/folders/1swYQD-69phlJUz4_HmdM0RFk_7okLK4v?usp=sharing).
<table>
  <tr>
      <td> &nbsp; &nbsp; &nbsp; &nbsp; GT&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Ours</td>
      <td> &nbsp; &nbsp; &nbsp; &nbsp; GT&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Ours</td>
      <td> &nbsp; &nbsp; &nbsp; &nbsp; GT&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Ours</td>
      <td> &nbsp; &nbsp; &nbsp; &nbsp; GT&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Ours</td>
      <td> &nbsp; &nbsp; &nbsp; &nbsp; GT&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Ours</td>
  </tr>
  <tr>
      <td> <img src="assets/gif/test140.gif" width = 200 height = 100 ></td>
      <td> <img src="assets/gif/test227.gif" width = 200 height = 100 ></td>
      <td> <img src="assets/gif/test271.gif" width = 200 height = 100 ></td>
      <td> <img src="assets/gif/test368.gif" width = 200 height = 100 ></td>
      <td> <img src="assets/gif/test333.gif" width = 200 height = 100 ></td>
  </tr> 
  <tr>
      <td> <img src="assets/gif/test381.gif" width = 200 height = 100 ></td>
      <td> <img src="assets/gif/test385.gif" width = 200 height = 100 ></td>
      <td> <img src="assets/gif/test403.gif" width = 200 height = 100 ></td>
      <td> <img src="assets/gif/test406.gif" width = 200 height = 100 ></td>
      <td> <img src="assets/gif/test463.gif" width = 200 height = 100 ></td>
    
  </tr>

  <tr>
      <td> <img src="assets/gif/test556.gif" width = 200 height = 100 ></td>
      <td> <img src="assets/gif/test669.gif" width = 200 height = 100 ></td>
      <td> <img src="assets/gif/test708.gif" width = 200 height = 100 ></td>
      <td> <img src="assets/gif/test1011.gif" width = 200 height = 100 ></td>
      <td> <img src="assets/gif/test582.gif" width = 200 height = 100 ></td>
    
  </tr>
</table>

## Environment setup
Create and activate conda environment named ```mind-video``` from our ```env.yaml```
```sh
conda env create -f env.yaml
conda activate mind-video
```

## Intructions
The large-scale pre-training dataset is downloaded from [HCP](https://www.humanconnectome.org/). And please refer to this [repo](https://github.com/zjc062/mind-vis) for large-scale pre-training scripts.
Our target dataset Wen (2018) can be downloaded from [here](https://purr.purdue.edu/publications/2809/1). 

- Run the multimodal contrastive learning
```sh
python scripts/contrastive_tuning.py --config configs/contrastive_tuning_sub1.yaml
```

- Run the stable diffusion model tuning
```sh
python scripts/video_tune.py --config configs/video_tune.yaml
```

- Run the co-training
```sh
python scripts/co_train.py --config configs/run_gm_sub1.yaml
```

-- Run generation with checkpoints
```sh
python scripts/eval_all.py --config configs/eval_all_sub1.yaml
```
