
# Run

```bash
CUDA_VISIBLE_DEVICES=0  python3 signround_v1.py --model_name facebook/opt-125m --amp --num_bits 4 --group_size -1 --seqlen 512
```

Run with minmax tuning
```bash
CUDA_VISIBLE_DEVICES=0  python3 signround_v2.py --model_name facebook/opt-125m --amp --num_bits 4 --group_size -1 --seqlen 512  --enable_minmax_tuning
```

To optimize GPU memory usage, you can enable the 'low_gpu_mem_usage' option. Additionally, you can reduce the training batch size (train_bs) and increase the gradient_accumulate_steps accordingly.
```bash
CUDA_VISIBLE_DEVICES=0 python3 signround_v2.py --model_name facebook/opt-125m --amp --num_bits 4 --group_size -1 --seqlen 512 --low_gpu_mem_usage --train_bs 1 --gradient_accumulate_steps 8 --enable_minmax_tuning
```

When rounding models of 30B or larger, it's crucial to enable the 'low_gpu_mem_usage' option. We recommend manually saving the resulting rounded model and evaluating it using at least two GPU cards. Alternatively, another method is outlined below.

```bash
CUDA_VISIBLE_DEVICES=0,1 python3 signround_v2.py --model_name facebook/opt-125m --amp --num_bits 4 --group_size -1 --seqlen 512 --low_gpu_mem_usage --train_bs 1 --gradient_accumulate_steps 8 --enable_minmax_tuning
```


# Known issue

The latest lm-eval significantly improve the Winogrande baseline for llamav1 and llamav2 models. To maintain consistency with our original data, we adopt an older version of lm-eval in this release. We cannot guarantee that it matches our version, as lm-eval hasn't updated its version number in a long time and we were unable to obtain the exact git id of our version. However, after checking several models, we found that the baseline matches or closely to our reported data.

The previous version of lm-eval had a bug on lambada for llamav1 models. We have fixed the issue and verified that the results match those obtained with the latest lm-eval.

This is a special release within short time, it may have some bugs.
