Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT’s Representations
Abstract: We introduce Metric-Learning Encoding Methods (MLEMs) as a new approach to understand neural representations of sentences and their linguistic features (e.g., tense, subject person, object number). MLEMs are capable to detect both local and distributed representations.As a proof-of-concept, we apply MLEMs to neural representations extracted from BERT, and find that: (1)~there exists an order among linguistic features, which separate representations of sentences to different degrees in different layers; (2)~for some layers, neural representations are organized in a \emph{hierarchical} way, with clusters nested within larger clusters, separated by linguistic features at different scales;(3)~in some layers (most strikingly the middle layer five of BERT), linguistic features are strongly disentangled, that is, represented within distinct clusters of selective units;(4)~MLEMs are more robust to type-I errors compared to multivariate decoding methods and are superior to univariate encoding methods in predicting neural activity.Together, this demonstrates the utility of Metric-Learning Encoding Methods for studying how linguistic features are neurally encoded in language models and the advantage of MLEMs over traditional methods. MLEMs can be extended to other domains (e.g., vision) and to other neural systems, such as the human brain.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
0 Replies
Loading