# MMC Transformer: Multiscale Multigrid Comparator Transformer for Few-shot Video Segmentation

## Getting Started

### Minimum requirements

1. Software :
+ torch==1.9.0
+ numpy==1.18.4
+ cv2==4.2.0
+ pyyaml==5.3.1

### Download data

##### Download YTVIS dataset
* use 2019 YTVIS version similar to [DANet](https://github.com/scutpaul/DANet)

### Download pre-trained models

#### Pre-trained backbones
First, you will need to download the ImageNet pre-trained backbones from RePRI at [here](https://drive.google.com/drive/folders/1Hrz1wOxOZm4nIIS7UMJeL79AQrdvpj6v) and put them under initmodel/. These will be used if you decide to train your models from scratch.

### Download trained models
* You can download trained models for both baseline and mmc-transformer for the fours splits on YouTube-VIS from [here](https://www.dropbox.com/s/5kbq3yjra2b1aav/trained_weights.zip?dl=0)

## Inference
* Run a quick evaluation scheme where the videos are uniformly subsampled to compare the baseline vs our model.

Our model
```
bash scripts/test_quick.sh inference/hsnet_transformer/ytvis_bidir 5 [0] 50 trained_weights/mmctransformer/ False False
```

Baseline
```
bash scripts/test_quick.sh inference/hsnet/ytvis 5 [1] 50 trained_weights/baseline/ False False
```

* Reproduce results for YouTube-VIS baseline vs our model full evaluation reported in the paper for SOA comparison

Our model
```
bash scripts/test.sh inference/hsnet_transformer/ytvis_bidir 5 [0] 50 trained_weights/mmctransformer/ False False
```

Baseline
```
bash scripts/test.sh inference/hsnet/ytvis 5 [1] 50 trained_weights/baseline/ False False
```

## Acknowledgments

We gratefully thank the authors of https://github.com/mboudiaf/RePRI-for-Few-Shot-Segmentation for building upon their code.
We also rely on https://github.com/scutpaul/DANet, for understanding their Youtube-VIS episodic version.
