## Transformer experiments with BERT
1. PAC-Bayes training with a scalar prior
```
python transformer_autotuning.py  --num_labels 2 --task_name SST --train_data ./data/SST/train.set.txt  --test_data ./data/SST/test.set.txt --lr 0.01 --batch_size 100 --train_size 100 --shift 200 --max_epoch 260 --refine_gamma 1
```
2. baseline
```
python transformer_autotuning.py  --num_labels 2 --task_name SST --train_data ./data/SST/train.set.txt  --test_data ./data/SST/test.set.txt --lr 0.01 --batch_size 100 --train_size 100 --max_epoch 260 --method baseline
```
