# scaling_law_EHR_foundation_model


## Setup

**On CUDA**

```bash
pip install --index-url https://download.pytorch.org/whl/nightly/cu118 --pre 'torch>=2.1.0dev'
```

**On CPU (incl Macs)**

```bash
pip install --index-url https://download.pytorch.org/whl/nightly/cpu --pre 'torch>=2.1.0dev'
```

**(Optional) install Flash Attention 2**

```bash
MAX_JOBS=4 pip install 'flash-attn>=2.0.0.post1' --no-build-isolation
```

All good, now install the dependencies plus some optional ones:

```bash
pip install -r requirements.txt tokenizers sentencepiece
```
## Conduct pretraining
sbatch nhird_flop.sh

## Generate finetuning dataset
Split the pre-tokenized patients into training/valid/testing partition
```bash
cd scripts
sbatch extract_ft_dataset.sh
```
## Finetuning
```
sbatch nhird_finetune_litgpt_pretrained.sh
```
