Sensitivity-Aware Adapter Placement for Efficient On-Device Fine-Tuning of Transformer Encoders

Sensitivity-Aware Adapter Placement for Efficient On-Device Fine-Tuning of Transformer Encoders

ACL ARR 2026 January Submission1442 Authors

30 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Parameter-Efficient Fine-Tuning, LoRA, On-Device Learning, Memory-Efficient Training, Adapter Placement, Structural Sparsity, Green AI, Transformers

Abstract: Fine-tuning encoder-based Transformers in memory-constrained settings (e.g., edge devices or 8-16 GB GPUs) is often limited by peak VRAM rather than wall-clock time. We propose Sensitivity-Aware Adapter Placement (SAAP), a parameter-efficient fine-tuning method that selectively instantiates low-rank adapters only in task-sensitive modules. SAAP identifies adapter locations using an activation-weighted squared-gradient score ($a \cdot g^{2}$) in a single probing process, introducing a modest one-time overhead (100-300 seconds) without architectural changes. Across DistilBERT, BERT-base, and RoBERTa-base on IMDb, AG News, Yelp Polarity, and TweetEval: Hate, SAAP updates far fewer parameters than standard LoRA while maintaining comparable accuracy. On BERT-base (IMDb), SAAP reduces trainable parameters from 1.20M to 0.03M (> 97%) and lowers peak training VRAM from approximately 3.5 GB to 0.9 GB. Overall, SAAP improves accuracy-per-parameter trade-offs and provides a transparent, drop-in solution for memory-efficient fine-tuning.

Paper Type: Short

Research Area: LLM Efficiency

Research Area Keywords: Efficient/Low-Resource Methods for NLP

Contribution Types: Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 1442

Loading