# ZOSA: Zero-Order Sharpness-Aware Minimization

In this work, we propose ZOSA, a novel zero-order sharpness-aware minimization framework for efficient prompt tuning of large language models in resource-constrained environments. By integrating batched Rademacher perturbations for gradient estimation, adaptive loss-variance scaling for stability, and sharpness-aware mechanisms to target flat minima. Theoretical analysis establishes $O(1/√T)$ convergence under smoothness and bounded variance assumptions, with PAC-Bayesian bounds linking sharpness control to enhanced generalization. Empirical evaluations on synthetic high-dimensional functions and zero-order prompt fine-tuning across GLUE benchmarks validate ZOSA's superiority, showing faster convergence, higher cosine similarity in gradient estimates, and enhanced accuracy/F1 scores compared to adaptive ZO baselines like ZO-AdaMM and evolutionary methods. These results underscore ZOSA's robustness in noisy, high-dimensional landscapes, making it a practical solution for zero-order LLM adaptation.

## Installation

```bash
conda env create -f bbt.yaml
```

## Usage

Use `bbt_zosa_zt.py` for Zero-Order Prompt Fine-tuning:

```bash
bash run_zosa.sh
```

We provide example script below for reproducing our experiments. 

```bash
--task_name "sst2" "rte" "snli" "mrpc" "agnews" "yelpp"
--intrinsic_dim 500 200 1000
d=500
task	lr		rho		eps		m
sst2	8e-5	2e-4	1e-4	8
rte		1e-6	0.05	1e-6	4
mrpc	2e-5	0.50	1e-4	8
agnews	5e-6	9e-6	1e-5	8
snli	5e-6	0.25	1e-5	4
yelpp	5e-5	6e-6	1e-4	8
d=1000
task	lr		rho		eps		m
sst2	2e-4	1e-3	1e-3	4
rte		1e-6	0.20	1e-6	8
mrpc	5e-6	0.03	1e-4	4
agnews	5e-6	3e-6	1e-5	8
snli	5e-7	7e-6	1e-5	8
yelpp	4e-5	5e-6	1e-4	8
d=200
task	lr		rho		eps		m
sst2	4e-4	4e-4	1e-4	8
rte		0.10	0.10	1e-5	8
mrpc	2e-5	0.20	1e-4	4
agnews	3e-6	2e-5	1e-5	8
snli	8e-7	7e-6	1e-5	4
yelpp	3e-5	2e-6	1e-4	8

```

