Improving Machine Translation by Searching Skip Connections EfficientlyDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Keywords: network morphism, machine translation, neural architecture search
Abstract: As a widely used neural network model in NLP (neural language processing), transformer model achieves state-of-the-art performance in several translation tasks. Transformer model has a fixed skip connection architecture among different layers. However, the influence of other possible skip connection architectures are not discussed completely in transformer model. We search different architectures of skip connection to discover better architectures in different datasets. To improve the efficiency of trying different skip connection architectures, we apply the idea of network morphism to add skip connections as a procedure of fine-tuning. Our fine-tuning method outperforms the best models trained by the same or smaller datasets in WMT'16 En-De, WMT'14 En-Fr and WMT'18 En-De with 226M back-translation sentences. We also make experiment on transferring searched skip connection architectures to new transformer models.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=KvrTCL-sW
5 Replies

Loading