
# LLMGD: Compressing LLMs with Layerwise Geodesic Distances

## Installation

### CUDA

The CUDA version we are using is 11.6.

### Python Dependencies

Install the required Python packages:

```
pip install -r requirements.txt
```

### Julia Setup

After installing the Python packages, install the required Julia packages by running:

```
julia install_julia_packages.jl
```

## Usage

### Configuration Parameters

Model and training arguments should be passed through:
```
args.py
```
- ``model_name``: Path to pretrained model or model identifier from huggingface.co/models
- ``layer_intervals``: Number of layers to prune.
- ``layer_num_data``: Amount of data used to calculate cosine similarity.
- ``train_num_data``: Amount of data used to train the lightweight model.
- ``batch_size``: Batch size for training.
- ``gradient_accumulation_step``: Number of gradient accumulation steps during training. The effective batch size is the product of gradient_accumulation_step and batch_size.
- ``epoches``: Number of training epochs.
- ``lr``: Learning rate for training.
- ``min_lr``: Minimum learning rate during training.

## Layer Pruning Analysis

### LLMGD
To find the best layers to prune using LLMGD (based on starting layer and interval), run:

```
python llmgd_best_layer.py
```

## ✂️ Layer Pruning

MSE Loss Training

To train the lightweight network using MSE loss, execute:

```
python mseloss_entry.py
```

This training process will be executed on a single GPU. All the pre-trained models and the dataset will be automatically downloaded, so you do not need to manually download the resource. When running it for the first time, it will require some time to download the model and the dataset. Please ensure that there is sufficient memory available, as all hidden states will be stored in memory. If memory is insufficient, you may modify the code to utilize get_cosine_oomsafe.py to get rid of OOM issues.


## Pruning Criteria

There are three different pruning strategies: LLMGD metric, Cosine similarity metirc and Kl divergence metric.

LLMGD metric:
After running the llmgd_best_layer.py, we will get the best layer parameter. Then, please modify the ./Layer_Pruning train_lightweightnetwork code to finish the retraining part.

Cosine similarity metric:
Please modify ./Layer_Pruning train_lightweightnetwork code to utilize get_cosine_oomsafe.py or get_cosine.py according to memory limitation.

Kl divergence metric:
Please modify ./Layer_Pruning train_lightweightnetwork code to utilize get_kl_divergence_interval.py.

The modified part is located under the "def lightweight_model_train" line. 

## Performance evaluation

In this project, we use lm_evaluation_harness to evaluate the model performance. The model weights will be stored under ./facebook. If you want to evaluate the results with several tasks, please read the lm_eval official tutorials.
