Abstract: While Transformers and their derivatives have shown strong performance in various NLP tasks, understanding their internal mechanisms remains challenging. Mainstream interpretability research often focuses solely on numerical attributes, neglecting the complex semantic structure inherent in the model. We have developed the SITH (Semantic Interpreter for Transformer Hierarchy) framework to address this issue. We focus on creating universal text representation methods and uncovering the semantic principles of the Transformer's hierarchical structure. We use the convex hull method to represent sequence semantics in an n-dimensional Semantic Euclidean space and analyze semantic quality and quantity changes across the convex hull's three dimensions: point, line, and surface. Our analysis takes a dual perspective: a multi-layer cumulative perspective and an individual layer-to-layer shift perspective. When applied to machine translation, our results reveal potential semantic processes and emphasize the effectiveness of stacking and hierarchical differences. These insights are valuable for fine-tuning hyperparameters at the encoder and decoder layers.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
Languages Studied: python
0 Replies
Loading