On the Influence of Layer Importance on LLM Fine-Tuning Acceleration and Quality

Alexander Demidovskij, Irina Novikova, Artyom Tugaryov, Vasilisa Blyudova, Olga Frolova, Yuri Ignatiev, Igor Salnikov, Aleksei Trutnev, Egor Zharikov

Published: 21 Oct 2025, Last Modified: 30 Mar 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Large Language Models (LLMs) have become central advancements in artificial intelligence, particularly in machine learning, natural language processing, and computer vision. Their ability to understand and generate human-like text has made them crucial in applications ranging from automated translation to text generation. Despite the vast capabilities of pre-trained LLMs, their deployment in specialized domains often requires fine-tuning, an adaptation process constrained by high resource demands and extensive computational time. At present, the most prominent approach for fine-tuning acceleration is LoRA, which involves inserting into the model trainable low-rank adapters while freezing the rest of the parameters. However, this state-of-the-art approach is significantly limited by the requirement to manually attach adapters to each Transformer block, leading to computational overhead. Developing novel fine-tuning strategies that overcome this limitation presents significant opportunities for reducing fine-tuning time without degrading the quality. This paper addresses this challenge by introducing an innovative fine-tuning solution that dynamically assigns LoRA adapters to Transformer blocks of the model based on the importance and convergence status, significantly enhancing the efficiency of the process. Our method improves upon existing techniques, providing a considerable fine-tuning acceleration on average of 55% without quality drop compared to LoRA.

External IDs:doi:10.3233/faia251317