Visualizing and Measuring the Geometry of BERT

Emily Reif; Ann Yuan; Martin M Wattenberg; Fernanda Viegas; Andy Coenen; Adam Pearce; Been Kim

Visualizing and Measuring the Geometry of BERT

Emily Reif, Ann Yuan, Martin M Wattenberg, Fernanda Viegas, Andy Coenen, Adam Pearce, Been Kim

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone

Abstract: Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.

Code Link: Tree visualization: https://github.com/PAIR-code/interpretability/tree/master/bert-tree, Word sense visualization: https://github.com/PAIR-code/interpretability/tree/master/context-atlas, Word sense disambiguation experiment: https://drive.google.com/open?id=1Y14MACuFdR2z2eWifQhp7t78mixKYLwZ, Concatenation experiment: https://drive.google.com/open?id=1bbwPacX8OYBoEaz8Btxbolihyfzvg2nJ, Semantic probe: https://drive.google.com/open?id=1Y14MACuFdR2z2eWifQhp7t78mixKYLwZ

CMT Num: 4632

0 Replies

Loading