# Training GPT-2 on Shakespeare-Char

## Installation

```
pip install torch numpy transformers datasets tiktoken wandb tqdm
```



## Data download
```sh
python data/shakespeare_char/prepare.py
```

This creates a `train.bin` and `val.bin` in that data directory.

## Training

```sh
python train.py config/shakespeare_ivonpcm.py
```
or 
```sh
python train.py config/shakespeare_ivon.py
```

## Results 

This will train a small GPT-2 model within minutes on a small GPU, on an RTX6000 we got the following results:

| method | val loss |
| ------| ------ | 
| IVON | 1.4733         |
| IVON-PCM | 1.4577  | 


## Training on OpenWebtext

The training script can just as well be used to train larger models, for example, our GPT-2 run on OpenWebText. For this, one only needs to follow the nanoGPT instructions to download the data and adjust the hyperparameters as specified in the appendix of our work.
