# Attribution
This repository is based on [nanoGPT](https://github.com/karpathy/nanoGPT) by **Andrej Karpathy**, licensed under the **MIT License**. Modifications have been made for this project.

---

# Installation

Install required Python packages:

```bash
pip install torch numpy wandb tqdm gdown packaging ninja
pip install flash-attn --no-build-isolation
```

---

# Environment / Tested On
* **GPU:** NVIDIA A40  
* **CUDA:** 12.2  
* **PyTorch:** 2.5.1+cu121  
* **FlashAttention:** 2.7.4

---

# Getting Started
### 1. Download the dataset:
```bash
gdown --fuzzy "https://drive.google.com/file/d/1vCEAnxuuIYC6tMXFoeR0MLPbV3oKLf0k/view?usp=sharing"
```

### 2. Unzip the dataset:
```bash
unzip pg19.zip
```

### 3. Train GPT-2 with Flash Attention
```bash
python train_FlashAttn.py
```

### 4. Train GPT-2 with LS Attention
```bash
python train_LSAttn.py
```


