Data-Efficient Adaptation of LLMs via Attention Head Reweighting

Zixiao Chen, Tuomas Oikarinen, Charlotte Siska, Tsui-Wei Weng, Chandan Singh, Jianfeng Gao

Published: 22 May 2026, Last Modified: 05 May 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Learning effectively from limited data is a foundational problem in machine learning, and is critical in domains like security where labeled examples are scarce. Large language models (LLMs) have demonstrated some capabilities for data-efficient learning, especially through parameter-efficient adaptation methods, but continue to struggle when faced with few samples for difficult tasks. To meet this challenge, we propose Attention Head Reweighting (AHR), a data-efficient method that adapts LLMs to a new text-classification task by learning only a single scalar per attention head. This drastically reduces the number of parameters which need to be learned by making use of the functional specialization of individual attention heads. Experiments on diverse open-source text classification datasets show that AHR can outperform standard baselines when learning from limited samples while only modifying 0.0001% of the model’s parameters.