Keywords: Unsupervised Neural Machine Translation, Marginal Likelihood Maximization, Mutual Information
Abstract: Unsupervised Neural Machine Translation or UNMT has received great attention
in recent years. Though tremendous empirical improvements have been achieved,
there still lacks theory-oriented investigation and thus some fundamental
questions like \textit{why} certain training protocol can work or not under
\textit{what} circumstances have not yet been well understood. This paper
attempts to provide theoretical insights for the above questions. Specifically,
following the methodology of comparative study, we leverage two perspectives,
i) \textit{marginal likelihood maximization} and ii) \textit{mutual information}
from information theory, to understand the different learning effects from the
standard training protocol and its variants. Our detailed analyses reveal
several critical conditions for the successful training of UNMT.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: Try to demystify why dae+bt training can lead to successfully trained UNMT model with decent performance.
Reviewed Version (pdf): https://openreview.net/references/pdf?id=72ZbiHcz2M
9 Replies
Loading