# UwULLM

Codebase/framework for training Large Language Model.

in TIPO, we use this framework to train our models (TIPO-200M/500M)

## Usage

To running the training, you should download the TIPO dataset from:
https://huggingface.co/datasets/TIPO-Anonymous/TIPO-dataset
put all the parquet file under ...trainer-repo/src/dataset/tipo/...parquet

Then, you can install and run the training script:

```bash
cd .../trainer-repo
python -m pip install -e .
python ./scripts/test_train.py ./config/train/tipo-200m.toml
```
