Keywords: Contextual Word Embedding, Category Theory, Manifold
Abstract: The current understanding of contextual word embedding interprets the representation by associating each token to a vector that is dynamically modulated by the context. However, this “token-centric” understanding does not explain how a model represents context itself, leading to a lack of characterization from such a perspective. In this work, to establish a rigorous definition of “context representation”, we formalize this intuition using a category theory framework, which indicates the necessity of including the information from both tokens and how transitions happen among different tokens in a given context. As a practical instantiation of our theoretical understanding, we also show how to leverage a manifold learning method to characterize how a representation model (i.e., BERT) encodes different contexts and how a representation of context changes when going through different components such as attention and FFN. We hope this novel theoretic perspective sheds light on the further improvements in Transformer-based language representation models.
One-sentence Summary: A theory-grounded analysis to tackle how contexts are represented by contextual word embedding
19 Replies
Loading