Keywords: Parameter-Efficient Fine-Tuning, LoRA, On-Device Learning, Memory-Efficient Training, Adapter Placement, Structural Sparsity, Green AI, Transformers
Abstract: Fine-tuning encoder-based Transformers in memory-constrained settings (e.g., edge devices or 8-16 GB GPUs) is often limited by peak VRAM rather than wall-clock time. We propose Sensitivity-Aware Adapter Placement (SAAP), a parameter-efficient fine-tuning method that selectively instantiates low-rank adapters only in task-sensitive modules. SAAP identifies adapter locations using an activation-weighted squared-gradient score ($a \cdot g^{2}$) in a single probing process, introducing a modest one-time overhead (100-300 seconds) without architectural changes. Across DistilBERT, BERT-base, and RoBERTa-base on IMDb, AG News, Yelp Polarity, and TweetEval: Hate, SAAP updates far fewer parameters than standard LoRA while maintaining comparable accuracy. On BERT-base (IMDb), SAAP reduces trainable parameters from 1.20M to 0.03M (> 97%) and lowers peak training VRAM from approximately 3.5 GB to 0.9 GB. Overall, SAAP improves accuracy-per-parameter trade-offs and provides a transparent, drop-in solution for memory-efficient fine-tuning.
Paper Type: Short
Research Area: LLM Efficiency
Research Area Keywords: Efficient/Low-Resource Methods for NLP
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 1442
Loading