Abstract: Transformer-encoder architectures for language modeling provide rich contextualized vectors, representing both, syntactic and semantic information captured during pre-training. These vectors are useful for multiple downstream tasks, but directly using the final layer representations might hide interesting elements represented in the hidden layers. In this paper, we propose Shact Syntactic Hierarchical Agglomerative Clustering from Transformer-Encoders , a model that disentangles syntactic span representations from these hidden representations, into a latent vector space. In our model, spans are expressed in terms of token distances. We propose a loss function that optimizes the neural disentanglement model from ground truth spans, and we propose to integrate these latent space vectors into a two-phase model via hierarchical clustering, suitable for multiple span recognition tasks.We evaluated our approach on flat and nested named entity recognition as well as chunking, showing the model’s ability to discover these spans, as well as having competitive results on the full recognition and classification tasks.
Loading