# Install
`pip install torch huggingface_hub datasets transformers wandb liger-kernel torchdata jsonargparse lm_eval`
Follow https://github.com/KellerJordan/Muon to download Muon

# Train
## Data
- Dowload data using `download_ds.py`
- Process data using `preprocess_data_packing.py`. E.g: `python preprocess_data_packing.py --out_path="llama_1b_packed_nemotron_cc_math_v1_4plus_wrapped_packing" --dataset_location="./datasets/Nemotron-CC-Math-v1-4plus"`
- Remember to tokenize with the right tokenizer for the model you want to train!
- Optional: pack data using `preprocess_data_packing.py`

## launch
Note: this example uses Huginn-0125 as we cannot submit models weights during submission.
Our converted models use the code in `./make_recurrent_model` to generate them.
To use ShortGPT to prune layers, follow https://github.com/sramshetty/ShortGPT

```
torchrun --nproc_per_node=4 train.py --epochs=1 --max_length=1024 --eval_interval=10000000000 --out_path=huginn_llama --optim_config.lr=5e-5 --preprocessed_data_path="datasets/your_dataset" --is_parquet_dataset=true --scheduler_args.cooldown=0.6 --scheduler_args.warmup=0.005 --max_grad_norm=1.0 --no_amp=false --micro_batch_size=1 --batch_size=4 --max_steps=1000 --run_name=testing --wandb_disabled=true
```

# Evals
We use lm_eval harness. Example commands:
```
OUT_ROOT="lm_evals"
MODEL_PATH=
chkpt=
MEAN_RECURRENCE=
lm_eval --model hf \
    --model_args pretrained=${MODEL_PATH}/model_only_chkpt_${chkpt},mean_recurrence=${MEAN_RECURRENCE},add_bos_token=True,dtype="float32",trust_remote_code=True,max_length=1024 \
    --tasks gsm8k \
    --device cuda \
    --output_path "${OUT_ROOT}/${MODEL_PATH}/model_only_chkpt_${chkpt}" \
    --batch_size 32 --num_fewshot 1
lm_eval --model hf \
    --model_args pretrained=${MODEL_PATH}/model_only_chkpt_${chkpt},mean_recurrence=${MEAN_RECURRENCE},add_bos_token=True,dtype="float32",trust_remote_code=True \
    --tasks hellaswag,arc_easy,arc_challenge,mmlu,openbookqa,piqa,social_iqa,winogrande \
    --device cuda \
    --output_path "${OUT_ROOT}/${MODEL_PATH}/model_only_chkpt_${chkpt}" \
    --batch_size auto
```

To evaluate model loss at different recurrences:
`python multi_recurence_eval.py --model_name /huginn_llama/my_model --ckpts [1000]`

# PLotting:
Use `plot_evals.py` 
