Keywords: GPT-2, Rogue Dimensions, Fine-Tuning
Abstract: Although transformer decoders are quickly becoming the most prominent NLP models, little is known about how they embed text in vector space and make decisions on downstream tasks. In this study, we evaluate the impact of fine-tuning on how GPT-2 represents text in vector space. In particular, we demonstrate that fine-tuning refines the last half of the network, and that task specific information is encoded into what the literature refers to as ``rogue dimensions''. In contrast to previous work, we find that rogue dimensions that emerge when fine-tuning GPT-2 are influential to the model decision making process. By using a linear threshold on a single rogue dimension in space, we can complete downstream classification tasks with an error of 1.6% relative to the full 768-dimensional representations of GPT-2
Paper Type: short
Research Area: Interpretability and Analysis of Models for NLP
0 Replies
Loading