# Advancing SVD-based LLM Compression via Layer-Wise Error Model Search

This is the official repository for the paper **Advancing SVD-based LLM Compression via Layer-Wise Error Model Search**.

<details>
  <summary>
  <font size="+1">Abstract</font>
  </summary>
Low-rank SVD-based compression offers a powerful strategy to reduce the computational costs of Large Language Models (LLMs); however, existing methods commonly encounter two recurring obstacles: (i) global rank allocation, where uncalibrated error proxies fail to account for complex error propagation, and (ii) decomposition quality, where Fisher-based estimators suffer from severe rank collapse. In this work, we address these limitations by presenting Layer-wise Error Modeling Search (LEMS) and KFAC-SVD. LEMS advances rank allocation by introducing a layer-wise error surrogate that integrates both local and global layer importance alongside a propagation bias, allowing us to determine global rank configurations efficiently as an Integer Linear Program (ILP). Simultaneously, KFAC-SVD improves decomposition quality by utilizing token-wise statistics, preventing the rank deficiency observed in prior Fisher-based SVD. We demonstrate across Mistral, Qwen3, and Llama-3 families that KFAC-SVD achieves an average perplexity improvements of 15%, while LEMS consistently outperforms existing search strategies, delivering significant zero-shot accuracy improvements of up to 4.7 p.p. that generalize to scales of 70B parameters.
</details>

## Model Compression Results
We compare our **LEMS (Search)**, against state-of-the-art search baselines across modern LLMs at two compression rates (0.8 and 0.6).

### Main Results of LEMS (Mistral, Llama-3, Qwen-3)

| Ratio | Search Method | **Mistral-7B** <br> Wiki / Acc | **Llama3-8B** <br> Wiki / Acc | **Qwen3-8B** <br> Wiki / Acc |
| :---: | :--- | :---: | :---: | :---: |
| **-** | **Baseline** | 5.25 / 63.95 | 6.14 / 63.34 | 9.71 / 62.03 |
| **0.8** | Uniform | 7.14 / 52.40 | 11.44 / 47.70 | 12.52 / 53.51 |
| | ASVD | 7.22 / 47.99 | 12.97 / 46.02 | 15.73 / 47.31 |
| | SVD-LLMv2 | 7.36 / 51.77 | 11.66 / 47.59 | 12.57 / 53.08 |
| | MRCS | 7.03 / 52.20 | 11.83 / 46.29 | 14.21 / 50.89 |
| | ARS | 7.26 / 54.58 | 11.81 / 54.31 | 11.58 / 55.66 |
| | ATP | 7.14 / 52.40 | 11.07 / 51.55 | 12.52 / 53.51 |
| | **LEMS (Ours)** | **5.98 / 57.50** | **8.19 / 55.76** | **10.38 / 58.82** |
| **0.6** | Uniform | 14.38 / 39.07 | 48.56 / 34.33 | 21.68 / 39.65 |
| | ASVD | 16.81 / 35.17 | 75.51 / 34.03 | 29.74 / 35.95 |
| | SVD-LLMv2 | 14.27 / 38.73 | 50.38 / 34.14 | 22.46 / 39.43 |
| | MRCS | 10.52 / 41.61 | 32.92 / 35.44 | 32.54 / 38.57 |
| | ARS | 19.43 / 40.83 | 28.77 / 40.57 | 29.51 / 40.72 |
| | ATP | 13.59 / 40.22 | 25.14 / 39.99 | 19.47 / 40.33 |
| | **LEMS (Ours)** | **10.71 / 44.00** | **18.22 / 42.99** | **15.40 / 45.49** |

*Note: Lower Wiki (Perplexity) is better. Higher Acc (Accuracy) is better.*

### SVD Method Comparison (Uniform)

We further analyze the effectiveness of our **KFAC-SVD** decomposition against other activation- and Fisher-based SVD methods. The table below compares these methods under a fixed **Uniform** compression strategy, as well as the combination of KFAC-SVD and LEMS.

| Ratio | Search Method | **Mistral-7B** <br> Wiki / Acc | **Llama3-8B** <br> Wiki / Acc | **Qwen3-8B** <br> Wiki / Acc |
| :---: | :--- | :---: | :---: | :---: |
| **-** | **Baseline** | 5.25 / 63.95 | 6.14 / 63.34 | 9.71 / 62.03 |
| **0.9** | FWSVD | 9.47 / 56.87 | 42.03 / 49.05 | 17.15 / 53.70 |
|  | ASVD | 9.14 / 57.25 | 65.09 / 47.90 | 20.36 / 51.19 |
|  | SVD-LLM | 6.46 / 56.81 | 10.14 / 52.28 | 12.52 / 55.91 |
|  | SVD-LLMv2 | 6.46 / 56.76 | 10.18 / 52.19 | 12.52 / 55.80 |
|  | DOBI-SVD | 7.11 / 55.16 | 11.22 / 53.11 | 13.46 / 55.48 |
|  | GFWSVD | 65.6 / 36.28 | 6562 / 31.71 | 642.1 / 41.85 |
|  | **KFAC-SVD (Ours)** | 6.22 / 56.91 | 8.84 / 54.41 | 11.51 / 57.04 |
|  | **+ LEMS (Full)** | **5.37 / 62.35** | **6.58 / 62.04** | **9.88 / 62.68** |
| **0.7** | FWSVD | 34.77 / 44.22 | 716.3 / 33.49 | 41.46 / 42.67 |
|  | ASVD | 28.16 / 46.07 | 10989 / 33.00 | 70.66 / 43.09 |
|  | SVD-LLM | 10.92 / 44.42 | 34.64 / 37.71 | 17.11 / 45.70 |
|  | SVD-LLMv2 | 10.96 / 44.40 | 34.98 / 37.65 | 17.15 / 45.63 |
|  | DOBI-SVD | 12.33 / 42.96 | 31.48 / 38.51 | 18.54 / 46.20 |
|  | GFWSVD | 2934 / 31.29 | 89881 / 31.61 | 30820 / 32.13 |
|  | **KFAC-SVD (Ours)** | 9.30 / 45.80 | 19.43 / 40.67 | 14.69 / 46.81 |
|  | **+ LEMS (Full)** | **7.42 / 51.51** | **11.05 / 49.76** | **11.88 / 53.56** |

## Reproduce Results

To compress a model, you may pick and choose different compression and search methods from the range that is offered. We provide example commands to replicate the results of the paper in the seciton below.

### Setup & Environment
#### Docker Setup
We strongly encourage using the dockerized environment for reproduction.
```bash
cd docker
docker build -t lems:torch2.2.1 .
# Mount your HF cache or dataset folder as needed
docker run --gpus 'all' -it --name="LLM_COMPRESS" -v ~/.cache/huggingface:/root/.cache/huggingface -v ./:/workspace lems:torch2.2.1
```
In this environment, you will be able to run the commands specified below. 

####  C4 Evaluation
For evaluating the C4 dataset additional steps are neccesary. Please refer to ```./local_datasets/c4/readme.md``` to manually make the dataset available. **Without this step, ```--extended_eval``` will fail!

#### ILP Solver Licences
All LEMS results in the paper have been obtained using gurobi as the ILP solver. However, running it will require a license ([free for educational institutions](https://www.gurobi.com/academia/academic-program-and-licenses/)). If you have a license put it in ```gurobi_license.json``` with format ```{"WLSACCESSID": "your id", "WLSSECRET": "your secret", "LICENSEID": your_id}```. 

Alternatively, you can use the **experimental plup/cbc implementation** that **does not require a licence** by passing ```--solver="cbc"```. In preliminary testing of our implementation it seems to yield good results as well, altough we do not guarantee that it will match our results. To simplify the problem and make it easier/fast to solve with cbc, we recommend lowering the number of variables by increasing the number for ```--enforce_rank_multiples_of``` to high numbers like 64.

### Rerun Comparisons
To recreate the numbers reported in the tables, run the commands below. For more detailed insights into the algorithms refer to any of the search approaches implementations located in ```./compression/search/*``` or SVD approaches located in ```./compression/factorization/*```. All commands provided will compress to ```0.8``` (search) and ```0.7``` (SVD) compression, to change it to ```0.6``` or any other rate, just change the ```--compression_target``` command line flag. The ```-uc``` flag ensures caching is used for the decomposition and sensitivity to accelerates runs. ```--extended_eval``` performs the extended sweep over zero-shot tasks, but will increase overall runtime. Remove it for fast wikitext only evaluation.
#### Reproduce Search Comparisons
The following commands reproduce the search comparison results using the exact same decomposition setup for all approaches. **Note:** The **ARS** baseline results were obtained using a [third-party repository](https://github.com/sidhantls/adaptive-rank-selection-svd) and are therefore not included in the direct execution commands below.
<details>
<summary>Mistral-7B</summary>
<ul>
<details>
<summary>LEMS (Ours)</summary>
<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method elastic --seq_len 2048 --compression_target 0.8 --measurements_points="0.1" --sensitivity_loss "energy2_normal_klscaled" --calib_bs 256 --seed 42 --model "unsloth/mistral-7b" --calib_dataset "wikitext2" -uc --extended_eval --crosslayer_term "harmonicv2"
</code></pre></ul>
</details>
<details>

<summary>Uniform</summary>

<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method uniform --seq_len 2048 --compression_target 0.8 --calib_bs 256 --seed 42 --model "unsloth/mistral-7b" --calib_dataset "wikitext2" -uc --extended_eval
</code></pre></ul>
</details>
<details>

<summary>ASVD</summary>
<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method asvd --seq_len 2048 --compression_target 0.8 --measurements_points="asvd_default" --sensitivity_loss "ppl" --beta 0.01 --calib_bs 256 --seed 42 --model "unsloth/mistral-7b" --calib_dataset "wikitext2" -uc --extended_eval</code></pre></ul>
</details>
<details>

<summary>SVD-LLMv2</summary>
<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method svd_llmv2 --seq_len 2048 --compression_target 0.8 --measurements_points="0.1" --sensitivity_loss "energy2" --beta 0.01 --calib_bs 256 --seed 42 --model "unsloth/mistral-7b" --calib_dataset "wikitext2" -uc --extended_eval </code></pre></ul>
</details>
<details>

<summary>MRCS</summary>
<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method memvit --seq_len 2048 --compression_target 0.8 --sensitivity_loss "energy2" --calib_bs 256 --seed 42 --model "unsloth/mistral-7b" --calib_dataset "wikitext2" -uc --extended_eval
</code></pre></ul>
</details>
</ul>
</details>

<details>
<summary>Llama-3-8B</summary>
<ul>
<details>
<summary>LEMS (Ours)</summary>
<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method elastic --seq_len 2048 --compression_target 0.8 --measurements_points="0.1" --sensitivity_loss "energy2_normal_klscaled" --calib_bs 256 --seed 42 --model "unsloth/llama-3-8b" --calib_dataset "wikitext2" -uc --extended_eval --crosslayer_term "harmonicv2"
</code></pre></ul>
</details>
<details>

<summary>Uniform</summary>

<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method uniform --seq_len 2048 --compression_target 0.8 --calib_bs 256 --seed 42 --model "unsloth/llama-3-8b" --calib_dataset "wikitext2" -uc --extended_eval
</code></pre></ul>
</details>
<details>

<summary>ASVD</summary>
<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method asvd --seq_len 2048 --compression_target 0.8 --measurements_points="asvd_default" --sensitivity_loss "ppl" --beta 0.01 --calib_bs 256 --seed 42 --model "unsloth/llama-3-8b" --calib_dataset "wikitext2" -uc --extended_eval</code></pre></ul>
</details>
<details>

<summary>SVD-LLMv2</summary>
<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method svd_llmv2 --seq_len 2048 --compression_target 0.8 --measurements_points="0.1" --sensitivity_loss "energy2" --beta 0.01 --calib_bs 256 --seed 42 --model "unsloth/llama-3-8b" --calib_dataset "wikitext2" -uc --extended_eval </code></pre></ul>
</details>
<details>

<summary>MRCS</summary>
<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method memvit --seq_len 2048 --compression_target 0.8 --sensitivity_loss "energy2" --calib_bs 256 --seed 42 --model "unsloth/llama-3-8b" --calib_dataset "wikitext2" -uc --extended_eval
</code></pre></ul>
</details>
</ul>
</details>

<details>
<summary>Qwen3-8B</summary>
<ul>
<details>
<summary>LEMS (Ours)</summary>
<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method elastic --seq_len 2048 --compression_target 0.8 --measurements_points="0.1" --sensitivity_loss "energy2_normal_klscaled" --calib_bs 256 --seed 42 --model "Qwen/Qwen3-8B" --calib_dataset "wikitext2" -uc --extended_eval --crosslayer_term "harmonicv2"
</code></pre></ul>
</details>
<details>

<summary>Uniform</summary>

<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method uniform --seq_len 2048 --compression_target 0.8 --calib_bs 256 --seed 42 --model "Qwen/Qwen3-8B" --calib_dataset "wikitext2" -uc --extended_eval
</code></pre></ul>
</details>
<details>

<summary>ASVD</summary>
<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method asvd --seq_len 2048 --compression_target 0.8 --measurements_points="asvd_default" --sensitivity_loss "ppl" --beta 0.01 --calib_bs 256 --seed 42 --model "Qwen/Qwen3-8B" --calib_dataset "wikitext2" -uc --extended_eval</code></pre></ul>
</details>
<details>

<summary>SVD-LLMv2</summary>
<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method svd_llmv2 --seq_len 2048 --compression_target 0.8 --measurements_points="0.1" --sensitivity_loss "energy2" --beta 0.01 --calib_bs 256 --seed 42 --model "Qwen/Qwen3-8B" --calib_dataset "wikitext2" -uc --extended_eval </code></pre></ul>
</details>
<details>

<summary>MRCS</summary>
<ul><pre><code>python compress_LLM.py --svd_method kfac_svd --search_method memvit --seq_len 2048 --compression_target 0.8 --sensitivity_loss "energy2" --calib_bs 256 --seed 42 --model "Qwen/Qwen3-8B" --calib_dataset "wikitext2" -uc --extended_eval
</code></pre></ul>
</details>
</ul>
</details>

#### Reproduce SVD Comparisons

The following commands reproduce the SVD comparison results. The default commands use **Uniform** search to isolate the impact of the decomposition method.

<details>
<summary>Mistral-7B</summary>
<ul>
<details>
<summary>KFAC-SVD (Ours)</summary>
<ul>
<li><strong>Uniform:</strong></li>
<pre><code>python compress_LLM.py --svd_method kfac_svd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/mistral-7b" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>+ LEMS:</strong></li>
<pre><code>python compress_LLM.py --svd_method kfac_svd --search_method elastic --seq_len 2048 --compression_target 0.7 --measurements_points="0.1" --sensitivity_loss "energy2_normal_klscaled" --extended_eval --calib_bs 256 --seed 42 --model "unsloth/mistral-7b" --calib_dataset "wikitext2" --crosslayer_term "harmonicv2" -uc
</code></pre></ul>

</details>
<details>
<summary>Baselines (Uniform)</summary>
<ul>

<li><strong>FWSVD:</strong></li>
<pre><code>python compress_LLM.py --svd_method fwsvd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/mistral-7b" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>ASVD:</strong></li>
<pre><code>python compress_LLM.py --svd_method asvd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/mistral-7b" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>SVD-LLM:</strong></li>
<pre><code>python compress_LLM.py --svd_method svd_llm --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/mistral-7b" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>SVD-LLMv2:</strong></li>
<pre><code>python compress_LLM.py --svd_method svd_llmv2 --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/mistral-7b" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>DOBI-SVD:</strong></li>
<pre><code>python compress_LLM.py --svd_method dobi_svd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/mistral-7b" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>GFWSVD:</strong></li>
<pre><code>python compress_LLM.py --svd_method gfwsvd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/mistral-7b" --calib_dataset "wikitext2" -uc
</code></pre>
</ul>
</details>
</ul>
</details>

<details>
<summary>Llama-3-8B</summary>
<ul>
<details>
<summary>KFAC-SVD (Ours)</summary>
<ul>
<li><strong>Uniform:</strong></li>
<pre><code>python compress_LLM.py --svd_method kfac_svd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/llama-3-8b" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>+ LEMS:</strong></li>
<pre><code>python compress_LLM.py --svd_method kfac_svd --search_method elastic --seq_len 2048 --compression_target 0.7 --measurements_points="0.1" --sensitivity_loss "energy2_normal_klscaled" --extended_eval --calib_bs 256 --seed 42 --model "unsloth/llama-3-8b" --calib_dataset "wikitext2" --crosslayer_term "harmonicv2" -uc
</code></pre></ul>

</details>
<details>
<summary>Baselines (Uniform)</summary>
<ul>

<li><strong>FWSVD:</strong></li>
<pre><code>python compress_LLM.py --svd_method fwsvd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/llama-3-8b" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>ASVD:</strong></li>
<pre><code>python compress_LLM.py --svd_method asvd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/llama-3-8b" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>SVD-LLM:</strong></li>
<pre><code>python compress_LLM.py --svd_method svd_llm --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/llama-3-8b" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>SVD-LLMv2:</strong></li>
<pre><code>python compress_LLM.py --svd_method svd_llmv2 --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/llama-3-8b" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>DOBI-SVD:</strong></li>
<pre><code>python compress_LLM.py --svd_method dobi_svd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/llama-3-8b" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>GFWSVD:</strong></li>
<pre><code>python compress_LLM.py --svd_method gfwsvd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "unsloth/llama-3-8b" --calib_dataset "wikitext2" -uc
</code></pre>
</ul>
</details>
</ul>
</details>

<details>
<summary>Qwen3-8B</summary>
<ul>
<details>
<summary>KFAC-SVD (Ours)</summary>
<ul>
<li><strong>Uniform:</strong></li>
<pre><code>python compress_LLM.py --svd_method kfac_svd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "Qwen/Qwen3-8B" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>+ LEMS:</strong></li>
<pre><code>python compress_LLM.py --svd_method kfac_svd --search_method elastic --seq_len 2048 --compression_target 0.7 --measurements_points="0.1" --sensitivity_loss "energy2_normal_klscaled" --extended_eval --calib_bs 256 --seed 42 --model "Qwen/Qwen3-8B" --calib_dataset "wikitext2" --crosslayer_term "harmonicv2" -uc
</code></pre></ul>

</details>
<details>
<summary>Baselines (Uniform)</summary>
<ul>

<li><strong>FWSVD:</strong></li>
<pre><code>python compress_LLM.py --svd_method fwsvd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "Qwen/Qwen3-8B" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>ASVD:</strong></li>
<pre><code>python compress_LLM.py --svd_method asvd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "Qwen/Qwen3-8B" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>SVD-LLM:</strong></li>
<pre><code>python compress_LLM.py --svd_method svd_llm --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "Qwen/Qwen3-8B" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>SVD-LLMv2:</strong></li>
<pre><code>python compress_LLM.py --svd_method svd_llmv2 --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "Qwen/Qwen3-8B" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>DOBI-SVD:</strong></li>
<pre><code>python compress_LLM.py --svd_method dobi_svd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "Qwen/Qwen3-8B" --calib_dataset "wikitext2" -uc
</code></pre>

<li><strong>GFWSVD:</strong></li>
<pre><code>python compress_LLM.py --svd_method gfwsvd --search_method uniform --seq_len 2048 --compression_target 0.7 --extended_eval --calib_bs 256 --seed 42 --model "Qwen/Qwen3-8B" --calib_dataset "wikitext2" -uc
</code></pre>
</ul>
</details>
</ul>
</details>

#### Execution for other Models
The provided code is highly flexible when it comes to different model architectures and should work out of the box for most huggingface models. You may use the commads provided above as a template for executing experiments. Note that most approaches decomposition and search approaches are compatible with each other and can be executed together by just combining them.