CrossSpectra: Exploiting Cross-Layer Smoothness for Parameter-Efficient Fine-Tuning

Yifei Zhang; Hao Zhu; Junhao Dong; Haoran Shi; Ziqiao Meng; Piotr Koniusz; Han Yu

CrossSpectra: Exploiting Cross-Layer Smoothness for Parameter-Efficient Fine-Tuning

Yifei Zhang, Hao Zhu, Junhao Dong, Haoran Shi, Ziqiao Meng, Piotr Koniusz, Han Yu

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Parameter-efficient fine-tuning (PEFT)

TL;DR: How to employ the coherence between layers to design a PEFT method.

Abstract: Parameter-efficient fine-tuning (PEFT) is essential for adapting large foundation models without excessive storage cost. However, current approaches such as LoRA treat each layer’s adaptation independently, overlooking correlations across layers. This independence causes the number of trainable parameters to grow linearly with model depth. We provide theoretical and empirical evidence that skip connections in transformers create smooth gradient propagation across layers. This smoothness leads to weight adaptations that concentrate most of their energy in low-frequency spectral components, especially along the layer dimension. Empirical analysis confirms this effect, showing that most of adaptation energy lies in low frequencies. Building on this insight, we propose CrossSpectra, which parameterizes all attention-weight adaptations $(Q, K, V)$ across layers as a single 3D tensor and represents them with sparse spectral coefficients ($\kappa_1, \kappa_2$). Using $\kappa_{1}$ non-zero coefficients within each layer’s frequency space and truncating to $\kappa_{2}$ frequencies across layers, CrossSpectra requires $\mathcal{O}(\kappa_{1}\kappa_{2})$ parameters instead of LoRA’s $\mathcal{O}(Lrd)$, where $L$ is the number of layers and $r$ the rank. Across natural-language and vision benchmarks, \methodname{} matches or surpasses baseline performance while using fewer parameters than LoRA, achieving only $0.36\%$ of LoRA’s parameter count when fine-tuning LLaMA-7B on instruction-following tasks. These results show that exploiting the \textbf{architectural smoothness of transformers} through spectral analysis yields major efficiency gains in PEFT.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 2

Loading