Keywords: Applications of interpretability
TL;DR: Feed-forward layers in code LMs capture syntax in lower layers and semantics in higher ones, and their concepts can be edited without hurting performance.
Abstract: Language Models (LMs) have shown their application for tasks pertinent to code and several code LMs have been proposed recently.
The majority of the studies in this direction only focus on the improvements in performance of the LMs on different benchmarks, whereas LMs are considered black boxes. Besides this, a handful of works attempt to understand the role of attention layers in the code LMs.
Nonetheless, feed-forward layers remain under-explored which consist of two-thirds of a typical transformer model's parameters.
In this work, we attempt to gain insights into the inner workings of code language models by examining the feed-forward layers.
We focus on examining the organization of stored concepts, the editability of these concepts, and the roles of different layers and input context size variations for output generation.
Our empirical findings demonstrate that lower layers capture syntactic patterns while higher layers encode abstract concepts and semantics.
We show concepts of interest can be edited within feed-forward layers without compromising code LM performance.
We anticipate these findings will facilitate better understanding, debugging, and testing of code LMs.
Submission Number: 162
Loading