Prior Knowledge Representation for Self-Attention Networks

Kehai Chen; Rui Wang; Masao Utiyama; Eiichiro Sumita

Prior Knowledge Representation for Self-Attention Networks

Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Prior Knowledge, Universal Representation, Self-Attention Networks, Neural Machine Translation

Abstract: Self-attention networks (SANs) have shown promising empirical results in various natural language processing tasks. Typically, it gradually learning language knowledge on the whole training dataset in parallel and stacked ways, thereby modeling language representation. In this paper, we propose a simple and general representation method to consider prior knowledge related to language representation from the beginning of training. Also, the proposed method allows SANs to leverage prior knowledge in a universal way compatible with neural networks. Furthermore, we apply it to one prior word frequency knowledge for the monolingual data and other prior translation lexicon knowledge for the bilingual data, respectively, thereby enhancing the language representation. Experimental results on WMT14 English-to-German and WMT17 Chinese-to-English translation tasks demonstrate the effectiveness and universality of the proposed method over a strong Transformer-based baseline.

One-sentence Summary: This work explores a universal prior knowledge representation aproach to self-attention networks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=hE5xAHs4oA

5 Replies

Loading