- Keywords: self-training, semi-supervised learning, neural sequence generatioin
- TL;DR: We revisit self-training as a semi-supervised learning method for neural sequence generation problem, and show that self-training can be quite successful with injected noise.
- Abstract: Self-training is one of the earliest and simplest semi-supervised methods. The key idea is to augment the original labeled dataset with unlabeled data paired with the model’s prediction. Self-training has mostly been well-studied to classification problems. However, in complex sequence generation tasks such as machine translation, it is still not clear how self-training woks due to the compositionality of the target space. In this work, we first show that it is not only possible but recommended to apply self-training in sequence generation. Through careful examination of the performance gains, we find that the noise added on the hidden states (e.g. dropout) is critical to the success of self-training, as this acts like a regularizer which forces the model to yield similar predictions for similar inputs from unlabeled data. To further encourage this mechanism, we propose to inject noise to the input space, resulting in a “noisy” version of self-training. Empirical study on standard benchmarks across machine translation and text summarization tasks under different resource settings shows that noisy self-training is able to effectively utilize unlabeled data and improve the baseline performance by large margin.