# DimPO: Dimensionality Reduction for Attention using Preference Optimization

This repository provides code for applying various preference optimization methods to reduce attention dimensionality.

#### Supported Loss Functions

- **DimPO** (our novel approach)
- ORPO
- SimPO
- CPO
- Triplet
- PCA
- Rand

#### Supported Models

- `llama_1b_instruct`
- `llama_3b_instruct`
- `llama_8b_instruct`
- `qwen3_4b_instruct`
- `qwen2_7b_instruct`


#### Setup

```sh
pip install -r requirements.txt
```



## Usage

1. Load DimPO layers from checkpoints in python
*If pretrained checkpoint is not available you need to pretrained them first. See section 'DimPO Checkpoints Pretraining'*

```python
from src.models.lowdim_trainer import AutoLowDimAttentionsModel
from transformers import AutoTokenizer

model_id = "meta-llama/Llama-3.2-1B-Instruct"
dimpo_path = "./checkpoints/DimPO_llama_1b_instruct_32" # d'=32 
projected_attention_layers = [8 9 10 11 12 13 14 15] # l = 8

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoLowDimAttentionsModel.from_pretrained(
    model_path = model_id, 
    lowdim_attentions_path = dimpo_path,
    device_map="cuda:0",
    lowdim_attn_layers = projected_attention_layers
)
```

2. Use the loaded model

```python
prompt = "Explain why the sky is blue."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(inputs.input_ids, max_new_tokens=256)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
```



### DimPO Checkpoints Pretraining
To generate checkpoints for all attention layers of *Llama3.2-1b-Instruct* projecting to target dimension *d'=32* run the following command:

```sh
python run_experiment.py \
    --model-name "llama_1b_instruct" \
    --target-dim "32" \
    --experiment-type "DimPO" \
    --checkpoint-path "./checkpoints/DimPO_llama_1b_instruct_32"
```



### Run Harness Tasks
1. Install lm_eval harness library version 0.4.9.1
2. Run the following command, specify your own tasks you want to evaluate

```sh
python ./evaluate_harness.py \
    --model LowDimAttentionsModel \
    --model_args "model_path=/media/data/vojtech/models/Llama-3.2-1B-Instruct,lowdim_attentions_path=./checkpoints/DimPO_llama_1b_instruct_32,lowdim_attn_layers=8 9 10 11 12 13 14 15"\
    --tasks hellaswag,arc_challenge,mmlu,truthfulqa_mc2,winogrande \
    --output_path ./harness_results/llama_1b/32_8 \
    --num_fewshot 0 --device auto --batch_size 16
```



### Effect of Key-Pair Selection on Pairwise Losses Experiments

#### All Key Pairs

```sh
python run_experiment.py \
    --model-name "llama_1b_instruct" \
    --target-dim "32" \
    --experiment-type "DimPO" \
    --experimental-mode "all_pairs" \
    --num-sampled-keys "8"
```

#### Multiple Distinct Pairs

```sh
python run_experiment.py \
    --model-name "llama_1b_instruct" \
    --target-dim "32" \
    --experiment-type "DimPO" \
    --experimental-mode "multiple_distinct_pairs" \
    --num-sampled-pairs "32"
```


#### Level of Key Diversity

```sh
python run_experiment.py \
    --model-name "llama_1b_instruct" \
    --target-dim "32" \
    --experiment-type "DimPO" \
    --experimental-mode "pair_fixed_distance" \
    --key-vector-pair-distance "127"
```