Abstract: Large Language Models (LLMs) are increasingly deployed on resource-constrained edge devices, where 4-bit post-training quantization is a dominant tool for reducing memory footprint. A central but underexplored question is whether the choice of fine-tuning optimizer affects how gracefully a model degrades under subsequent aggressive quantization. Recent work has shown that Muon-pretrained models exhibit fewer activation-channel outliers and correspondingly lower accuracy degradation under 4-bit quantization than Adam-pretrained models (Park et al., 2025), but this observation has been confined to the pretraining regime. In this work, we test whether this quantization robustness extends to the parameter-efficient fine-tuning regime relevant to practical edge deployment. We integrate synthetic data generation, logit-based knowledge distillation from a vocabulary-aligned teacher, LoRA fine-tuning, Bayesian hyperparameter optimization, and GPTQ 4-bit quantization into a single end-to-end pipeline, and use it as a controlled testbed to compare Adam-optimized and Muon optimized LoRA fine-tuning across eight standard LLM benchmarks, with five HPO replicates per condition for statistical reporting. We report two primary empirical findings. First, Muon-optimized LoRA fine-tuning yields models that degrade less under 4-bit quantization than Adam-optimized counterparts on six of eight benchmarks, extending the pretraining-era observation of Park et al. (2025) to the fine-tuning regime. Second, Bayesian hyperparameter optimization consistently selects pure KL-divergence alignment with the teacher (alpha = 1), indicating that on synthetic distillation data the teacher’s output distribution is the dominant training signal relative to supervised cross-entropy. The full pipeline achieves approximately 2× memory compression (e.g., 6 GB to 3 GB) and up to 50% lower per-token latency while matching or exceeding naive GPTQ quantization on every benchmark studied.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Changyou_Chen1
Submission Number: 6872
Loading