BURT: BERT-inspired Universal Representation from Learning Meaningful SegmentDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Keywords: language modeling
Abstract: Although pre-trained contextualized language models such as BERT achieve significant performance on various downstream tasks, current language representation still only focuses on linguistic objective at a specific granularity, which is not applicable when multiple levels of linguistic units are involved at the same time. Thus we present a universal representation model, BURT (BERT-inspired Universal Representation from learning meaningful segmenT), to encode different levels of linguistic unit into the same vector space. Specifically, we extract and mask meaningful segments based on point-wise mutual information (PMI) to incorporate different granular objectives into the pre-training stage. Our model surpasses BERT and BERT-wwm-ext on a wide range of downstream tasks in the ChineseGLUE (CLUE) benchmark. Especially, BURT-wwm-ext obtains 74.48% on the WSC test set, 3.45% point absolute improvement compared with its baseline model. We further verify the effectiveness of our unified pre-training strategy in two real-world text matching scenarios. As a result, our model significantly outperforms existing information retrieval (IR) methods and yields universal representations that can be directly applied to retrieval-based question-answering and natural language generation tasks.
One-sentence Summary: We present BURT to learn embeddings of different levels of linguistic unit in the same vector space.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=pb0YvTftJA
12 Replies

Loading