BERT-JAM: Maximizing the utilization of BERT for neural machine translation

Zhebin Zhang, Sai Wu, Dawei Jiang, Gang Chen

Published: 2021, Last Modified: 21 Jan 2026Neurocomputing 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•Employs joint attention for the incorporation of BERT into NMT models.•Makes use of the representations of BERT’s intermediate layers.•Employs a three-phase optimization strategy to overcome catastrophic forgetting.•Studies how the size of BERT impacts the performance of NMT models.