Path Selection Makes BERT-family Good Generators

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: BERT-family, path selection, natural language generation
Abstract: The Mask-Predict decoding algorithm has been widely used to enhance the generation capacity of traditional non-autoregressive (NAR) models and provide a good recipe for adapting the pre-trained BERT-like masked language models (MLMs) to NAR generation scenarios. However, these models, which we denote as NAR-MLMs, are still regarded as inferior to competitive autoregressive (AR) models in terms of performance. In this paper, we further explore the core problems leading to the performance gap of NAR-MLMs and delve into effective solutions for technological innovation. Specifically, most related works neglect the impact of the training sequence decomposition format, i.e., Unlike the AR models which can naturally decompose the text sequence in a left-to-right manner for training and inference, NAR-MLMs are trained with a random decomposition but aim to find a determined optimal composition (denoted as decoding paths) during inference. To alleviate this mismatching, we propose decoding path selection to increase the search space for finding a better composition, and path optimization methods to enable the model decoding path preference during the training process. Results on various zero-shot common sense reasoning and reading comprehension tasks and several task-specific generation tasks demonstrate that our NAR-MLM achieves significant performance improvements on common benchmarks with the methods mentioned above, reaching performance levels comparable to even outperforming AR pre-trained models. Our model and code will be available at Github.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 14035
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview