Joint Localization and Activation Editing for Low-Resource Fine-Tuning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Activation editing approach that jointly optimizes the selection of intervention components and the intervention strategy in low-resource setting
Abstract: Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, are commonly used to adapt LLMs. However, the effectiveness of standard PEFT methods is limited in low-resource scenarios with only a few hundred examples. Recent advances in interpretability research have inspired the emergence of activation editing (or steering) techniques, which modify the activations of specific model components. Due to their extremely small parameter counts, these methods show promise for small datasets. However, their performance is highly dependent on identifying the correct modules to edit and often lacks stability across different datasets. In this paper, we propose Joint Localization and Activation Editing (JoLA), a method that jointly learns (1) which heads in the Transformer to edit (2) whether the intervention should be additive, multiplicative, or both and (3) the intervention parameters themselves - the vectors applied as additive offsets or multiplicative scalings to the head output. Through evaluations on three benchmarks spanning commonsense reasoning, natural language understanding, and natural language generation, we demonstrate that JoLA consistently outperforms existing methods.
Lay Summary: (1) We introduce JoLA, a novel activation editing approach that jointly optimizes the selection of intervention components and the intervention strategy, specifically tailored for low-resource scenarios. (2) We demonstrate that JoLA achieves stable and consistent performance across diverse tasks, addressing key limitations of existing methods. We further validate its effectiveness across different data scales and model sizes. (3) We provide new insights into the role of attention heads in activation editing, showing that they are the most impactful components for fine-tuning.
Link To Code: https://github.com/wenlai-lavine/jola
Primary Area: Deep Learning->Large Language Models
Keywords: activation editing, low-resource, localization
Submission Number: 13402
Loading