## Reconstruction-Consistent Masked Auto-Ecoders

This is the official implementation of a paper titled "Exploring The Role of Mean Teachers in Self-supervised Masked Auto-Encoders".

### Installation
Since we have implemeneted on MAE's official code, just following MAE's [guideline](https://github.com/facebookresearch/mae).

* We note that all pre-training, finetuning and linear probing experiments were conducted on 8-GPUs machine.
So our scripts are set to use 8 GPUs.

* This code have a implementation of RC-MAE-S which use the same masks for both student-teacher networks.



### Pre-training & Fine-tuning



For RC-MAE with ViT-Base for 1600 epoch:

```bash
bash scripts/pretrain_finetune_rc_v1b_vit_base_1600ep.sh <path/to/imagenet> <path/to/output_dir>
```

For RC-MAE with ViT-Large for 1600 epoch:

```bash
bash scripts/pretrain_finetune_rc_v1b_vit_large_1600ep.sh <path/to/imagenet> <path/to/output_dir>
```

### Linear model

```bash
bash scripts/linear_model.sh
```
