Path Selection Makes BERT-family Good Generators

Yisheng Xiao; xiaobo liang; Juntao Li; Zechen Sun; Pei Guo; Wenpeng Hu; Min Zhang

Path Selection Makes BERT-family Good Generators

Yisheng Xiao, xiaobo liang, Juntao Li, Zechen Sun, Pei Guo, Wenpeng Hu, Min Zhang

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: BERT-family, path selection, natural language generation

Abstract: The Mask-Predict decoding algorithm has been widely used to enhance the generation capacity of traditional non-autoregressive (NAR) models and provide a good recipe for adapting the pre-trained BERT-like masked language models (MLMs) to NAR generation scenarios. However, these models, which we denote as NAR-MLMs, are still regarded as inferior to competitive autoregressive (AR) models in terms of performance. In this paper, we further explore the core problems leading to the performance gap of NAR-MLMs and delve into effective solutions for technological innovation. Specifically, most related works neglect the impact of the training sequence decomposition format, i.e., Unlike the AR models which can naturally decompose the text sequence in a left-to-right manner for training and inference, NAR-MLMs are trained with a random decomposition but aim to find a determined optimal composition (denoted as decoding paths) during inference. To alleviate this mismatching, we propose decoding path selection to increase the search space for finding a better composition, and path optimization methods to enable the model decoding path preference during the training process. Results on various zero-shot common sense reasoning and reading comprehension tasks and several task-specific generation tasks demonstrate that our NAR-MLM achieves significant performance improvements on common benchmarks with the methods mentioned above, reaching performance levels comparable to even outperforming AR pre-trained models. Our model and code will be available at Github.

Supplementary Material: zip

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 14035

Loading