# Purpose
This code is used to conduct baseline and self-train experiments, on CIFAR10, ImageNet, Penn Tree Bank and our synthetic gaussian mixture model.

# Usage

## CIFAR10
We use $cifar10/main.py$ to train CIFAR10 with standard baseline training algorithm for CIFAR10.

## IMAGENET

We use $imagenet/main.py$ to train ImageNet standardly.

## Penn Tree Bank

We use the well known fairseq repo to train ptb.

Based on directory ptb/fairseq

We need to manual download ptb dataset and run the following command with TEXT as the address ot ptb dataset.
```bash
fairseq-preprocess
--only-source
--trainpref $TEXT/ptb.train.txt
--validpref $TEXT/ptb.valid.txt
--testpref $TEXT/ptb.test.txt
--destdir data-bin/ptb 
--workers 20
```

We will then train the language model using the following command
```bash
fairseq-train --task language_modeling   data-bin/ptb   --save-dir ptb   --arch transformer_lm --share-decoder-input-output-embed   --dropout 0.1   --optimizer adam --adam-betas '(0.9, 0.98)' --weight-decay 0.01 --clip-norm 0.0   --lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07   --tokens-per-sample 512 --sample-break-mode none   --max-tokens 2048 --update-freq 16   --fp16   --max-update 80000 2>&1 | tee trainptb.log
```


## Linear

The way to use this part is to use $linear/linear.ipynb$ to first generate bash using the first block.

Run all the script to generate corresponding statistics and use the following blocks to output the corresponding pictures showing aggregated statistics.

# Requirement 
* PyTorch 1.11.3
* cuda && cudnn

# Acknowledgement
The code is based on three github repositories.

**ImageNet:** https://github.com/jiweibo/ImageNet

**CIFAR10:** https://github.com/kuangliu/pytorch-cifar

**Penn Tree Bank:** https://github.com/facebookresearch/fairseq


