Unsupervised Distillation of Syntactic Information from Contextualized Word Representations

Shauli Ravfogel; Yanai Elazar; Jacob Goldberger; Yoav Goldberg

Unsupervised Distillation of Syntactic Information from Contextualized Word Representations

Shauli Ravfogel, Yanai Elazar, Jacob Goldberger, Yoav Goldberg

25 Sept 2019 (modified: 12 Oct 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: dismantlement, contextualized word representations, language models, representation learning

TL;DR: We distill language models representations for syntax by unsupervised metric learning

Abstract: Contextualized word representations, such as ELMo and BERT, were shown to perform well on a various of semantic and structural (syntactic) task. In this work, we tackle the task of unsupervised disentanglement between semantics and structure in neural language representations: we aim to learn a transformation of the contextualized vectors, that discards the lexical semantics, but keeps the structural information. To this end, we automatically generate groups of sentences which are structurally similar but semantically different, and use metric-learning approach to learn a transformation that emphasizes the structural component that is encoded in the vectors. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics. Finally, we demonstrate the utility of our distilled representations by showing that they outperform the original contextualized representations in few-shot parsing setting.

Code: https://drive.google.com/file/d/1tGoYmNCOSTgE7T5RjRv_bV7o3JUD160t/view

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/unsupervised-distillation-of-syntactic/code)

Original Pdf: pdf

17 Replies

Loading