SITH: Semantic Interpreter for Transformer HierarchyDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: While Transformers and their derivatives have shown strong performance in various NLP tasks, understanding their internal mechanisms remains challenging. Mainstream interpretability research often focuses solely on numerical attributes, neglecting the complex semantic structure inherent in the model. We have developed the SITH (Semantic Interpreter for Transformer Hierarchy) framework to address this issue. We focus on creating universal text representation methods and uncovering the semantic principles of the Transformer's hierarchical structure. We use the convex hull method to represent sequence semantics in an n-dimensional Semantic Euclidean space and analyze semantic quality and quantity changes across the convex hull's three dimensions: point, line, and surface. Our analysis takes a dual perspective: a multi-layer cumulative perspective and an individual layer-to-layer shift perspective. When applied to machine translation, our results reveal potential semantic processes and emphasize the effectiveness of stacking and hierarchical differences. These insights are valuable for fine-tuning hyperparameters at the encoder and decoder layers.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
Languages Studied: python
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview