Abstract: Seq2seq models have shined in the field of Neural Machine Translation (NMT). However, word embeddings learned by NMT models tend to degenerate and be distributed into a narrow cone, named {\em{representation degeneration problem}}, which limits the representation capacity of word embeddings. In this paper, we propose a Contrastive Word Embedding Learning (CWEL) method to address this problem. CWEL combines the ideas of contrastive representation learning with embedding regularization, and adaptively minimizes the cosine similarity of word embeddings on the target side according to their semantic similarity. Experiments on multiple translation benchmark datasets show that CWEL significantly improves translation qualities. Additional analysis shows that the improvements mainly come from the well-learned word embeddings.
0 Replies
Loading