# GPT-2 Experiments for SoLoRA

**LoRA Meets Second-Order Optimization: Towards Optimal Low-Rank Updates**

This repository builds on Riemannian Preconditioned LoRA project [(Zhang and Pilanci, 2024)](https://arxiv.org/abs/2402.02347).

## Repository Overview

* [examples/NLG/src/](examples/NLG/src) contains the source code used for data processing, training, and decoding.
* [examples/NLG/eval/](examples/NLG/eval) contains the code for task-specific evaluation scripts.
* [examples/NLG/vocab/](examples/NLG/vocab) contains the GPT-2 vocabulary files.
* [loralib/](loralib) contains the lora library implementation.

## Requirements

See the [LoRA](https://github.com/microsoft/LoRA/tree/main) repository for requirements.

## Quickstart

Clone the repo and run the following command

```
cd examples/NLG
 pip install -r requirement.txt
 bash download_pretrained_checkpoints.sh
 bash create_datasets.sh
 cd ./eval
 bash download_evalscript.sh
 cd ../../..
 python setup.py develop
 sudo apt-get install default-jre
```

## E2E Experiment

1. Enter experiment folder

```
cd examples/NLG
```

2. Train GPT-2 small model with SoLoRA optimizer (see our paper for hyperparameters)

```
python -m torch.distributed.launch --nproc_per_node=1  src/gpt2_ft.py  \
   --train_data ./data/e2e/train.jsonl   \
   --valid_data ./data/e2e/valid.jsonl  \
   --train_batch_size 8  \
   --grad_acc 1   \
   --valid_batch_size 4  \
   --seq_len 512  \
   --model_card gpt2.sm \
   --init_checkpoint ./pretrained_checkpoints/gpt2-pytorch_model.bin  \
   --platform local  \
   --clip 0.0  \
   --lr 3e-4 \
   --weight_decay 0.01 \
   --correct_bias   \
   --adam_beta1 0.9 \
   --adam_beta2 0.98 \
   --scheduler linear  \
   --warmup_step 500 \
   --max_epoch 5   \
   --save_interval 20000  \
   --lora_dim 64  \
   --lora_alpha 128  \
   --lora_dropout 0.1  \
   --label_smooth 0.1   \
   --work_dir ./trained_models/GPT2_S/e2e  \
   --random_seed 110  \
   --trial_name solora_experiment_r64 \
   --opt solora
```

Here `sgd, scaled_gd, lora_pro_sgd, solora_sgd, adamw, scaled_adamw, lora_pro_adamw, solora ` are all valid choices for `--opt`.

2. Generate output

```
python -m torch.distributed.launch --nproc_per_node=1 src/gpt2_beam.py \
    --data ./data/e2e/test.jsonl \
    --batch_size 1 \
    --seq_len 512 \
    --eval_len 64 \
    --model_card gpt2.sm \
    --init_checkpoint ./trained_models/GPT2_S/e2e/model_solora_experiment_r64.26290.pt \
    --platform local \
    --lora_dim 64 \
    --lora_alpha 128 \
    --beam 10 \
    --length_penalty 0.8 \
    --no_repeat_ngram_size 4 \
    --repetition_penalty 1.0 \
    --eos_token_id 628 \
    --work_dir ./trained_models/GPT2_S/e2e \
    --output_file predict_e2e_solora_experiment_r64.jsonl
```

3. Decode outputs from step (2)

```
python src/gpt2_decode.py \
    --vocab ./vocab \
    --sample_file ./trained_models/GPT2_S/e2e/predict_e2e_solora_experiment_r64.jsonl \
    --input_file ./data/e2e/test_formatted.jsonl \
    --output_ref_file e2e_ref.txt \
    --output_pred_file e2e_pred.txt
```

4. Run evaluation on E2E test set

```
python eval/e2e/measure_scores.py e2e_ref.txt e2e_pred.txt -p
```
