Abstract: While Transformers and their derivatives have shown strong performance in various NLP tasks, understanding their internal mechanisms remains challenging. Mainstream interpretability research often focuses solely on numerical attributes, neglecting the complex semantic structure inherent in the model. We have developed the SITH(Semantic Interpreter for Transformer Hierarchy) framework to address this issue. We focus on creating universal text representation methods and uncovering the semantic principles of the Transformer's hierarchical structure. We use the convex hull method to represent sequence semantics in an n-dimensional Semantic Euclidean space and define different evaluation indicators through convex hull to analyze semantic quality and quantity changes. Our analysis takes a dual perspective: a multi-layer cumulative perspective and an individual layer-to-layer shift perspective. When applied to machine translation, our results reveal potential semantic processes and emphasize the effectiveness of stacking and hierarchical differences. These insights are valuable for fine-tuning hyperparameters at the encoder and decoder layers.
External IDs:dblp:conf/ictai/0019LCS0024
Loading