# Source code for paper SVD-0: Enhancing Zeroth-Order Subspace Fine-Tuning for Language Models through Gradient-Informed Projection

In the paper, we propose SVD-0, a novel gradient-informed subspace optimization framework that synergizes zeroth-order efficiency with principled subspace discovery. 
Our key insight is that, while exact first-order gradients remain inaccessible due to memory constraints, ZO gradient estimates contain sufficient directional information to reconstruct high-fidelity subspaces. 
Specifically, SVD-0 periodically performs singular value decomposition (SVD) on ZO gradient estimates to derive layer-wise projection matrices that capture dominant optimization directions. 
By preserving the intrinsic structure of the subspace, our method effectively enhances the performance of subspace-based ZO methods

## Environment
- We use python 3.10 and torch 2.1.0, transformers 4.28.1, and cuda 11.8.0.
you can create a conda environment with the following command:
```bash
    pip install -r requirements.txt
```

## Run on large models

In the large_model folder, we provide a script to run the code on large models.
```bash
    bash ./large_models/run.sh
```

You can modify the script to run on different models with different hyperparameters.
For example, to run on OPT-1.3b with SVD-0, you can modify the script as follows:
```bash
    bash ./large_models/run.sh --model_name facebook/opt-1.3b --trainer svd_sgd
```
Please refer to the script for more details.

## Run on medium models

The medium_model folder contains the code to run on medium models.
Here, we provide a script to run the code on medium models.
```bash
    bash ./medium_models/svd.sh
```

Before running the script, please prepare the data in the `./medium_models/data/origin` folder.
You download the date from [here](https://nlp.cs.princeton.edu/projects/lm-bff/datasets.tar).

After that, please run the script as follows:
```bash
    python tools/generate_k_shot_data.py --mode k-shot-1k-test --k 512
```
As in the paper, we take 5 different seeds of 13, 21, 42, 87, 100.

Now you can run the script as follows:
```bash
    TASK=SST-2 K=512 SEED=42 BS=64 LR=1e-6 RANK=16 STEP_INTERVAL=100 MODEL=roberta-large bash ./medium_models/svd.sh
```
The datasets include 'SST-2', 'sst-5', 'SNLI', 'MNLI'.
For more details, please refer to the scripts in the `./medium_models` folder.


## Additional Information

This project is a reimplementation and extension of the methods introduced in [Zeroth-Order Fine-Tuning of LLMs in Random Subspaces](https://github.com/zimingyy/SubZero) and [LOZO: Enhancing Zeroth-order fine-tuning for language models with low-rank structures](https://github.com/optsuite/LOZO). The original code is licensed under the [License](https://github.com/zimingyy/SubZero/blob/main/LICENSE), respectively. 