# Prune to Fit: Enabling Federated Fine-Tuning within Edge Memory Budgets

Federated fine-tuning enables privacy-preserving Large Language Model (LLM) adaptation, but its high memory cost limits participation from resource-constrained devices. We propose FedPruner, an innovative federated fine-tuning paradigm that tackles this via intelligent layer pruning. FedPruner flexibly prunes the global model, creating personalized submodels based on device memory constraints. It employs a macro-micro synergistic pruning framework: a macro-level functionality-driven layer orchestration mechanism groups layers, while a micro-level importance-aware layer selection strategy prunes within groups to build device-specific submodels. We further introduce a fine-grained variant that independently prunes Multi-Head Attention and Feed-Forward Network components to precisely preserve critical architectural elements. Extensive experiments demonstrate that FedPruner significantly outperforms state-of-the-art methods with average accuracy gains of up to 11.11\%. Moreover, it maintains strong robustness under varying memory constraints, yielding a 1.98\% average performance improvement while reducing peak memory usage by 75\%.

## Setup

Clone the repo, submodules and install the required packages.

```
git clone xxx
cd FedPruner
conda create -n fedllm python=3.10
conda activate fedllm
pip install -r requirements.txt
```

## Training

We provide training scripts under `training_scripts/`. Try them out from the top-level directory of this repository.

### Federated Instruction Tuning

The training script is in `training_scripts/run_sft.sh`.

```
CUDA_VISIBLE_DEVICES=1 python main_sft.py \
 --model_name_or_path "meta-llama/Llama-2-7b-hf" \
 --dataset_name "vicgalle/alpaca-gpt4" \
 --dataset_sample 20000 \
 --fed_alg "fedavg" \
 --num_clients 20 \
 --sample_clients 2 \
 --max_steps 10 \
 --num_rounds 200 \
 --batch_size 16 \
 --gradient_accumulation_steps 1 \
 --seq_length 512 \
 --peft_lora_r 32 \
 --peft_lora_alpha 64 \
 --use_peft \
 --load_in_8bit \
 --output_dir "./output" \
 --template "alpaca" \
```

Key arguments:

- `model_name_or_path`: the name or local location of your base model
- `template`: template for chatting. Define your own template in `utils/template.py`.
- `dataset_name`: the name of dataset. You may modify `utils/process_dataset.py` if your interested dataset has not been supported.
- `dataset_sample`: needed if you want to sample a specific number of samples from the original dataset.
- `fed_alg`: the name of federated learning algorithm
- `num_clients`/sample_clients: `num_clients` clients in total, `sample_clients` clients for each round
- `max_steps`: the number of model update steps for one client at each round.

## Evaluation

Evaluation codes are put in `evaluation/` directory. Our close-ended evaluations follow [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). For example, the evaluation of fine-tuned TinyLLaMA model on MMLU can be performed with

```sh
lm_eval --model hf --model_args pretrained=TinyLlama/TinyLlama-1.1B-Chat-v1.0,parallelize=True,peft=your_lora_model_file,load_in_4bit=False --tasks mmlu --num_fewshot 5 --device cuda
```

Our open-ended evaluations follow existing high-incluence open-source repos. Please refer to each sub-directory for the corresponding detailed README and running script. For example, `evaluation/open_ended/` include open-ended evaluations on three benchmarks, covering MT-Bench and Vicuna Bench; see [README.md](evaluation/open_ended/README.md).

## Citation

Please cite our paper if you find the repository helpful.

```
Bibtex to be updated ...
```

## Acknowledgement

We thank OpenFedLLM for their open source federated learning framework.
