Abstract: Recent generative language models assume a pre-defined monotonic left-to-right sequence decomposition format to learn, which has been proven very effective in current well-known decoder-only autoregressive large language models, but might be inefficient in learning many specific task such as reasoning.
In this paper, we explore the potential of other feasible decomposition formats for language models to effectively compensate the autoregressive language modeling paradigm.
Specifically, we aim to find the appropriate composition from multiple candidates through introducing effective path selection in both training and decoding. Experiments on total \textbf{11} zero-shot reasoning tasks and \textbf{2} language generation tasks demonstrate the effectiveness of our methods, indicating that more suitable decomposition formats beyond a left-to-right order do exist, and superior performance can be achieved by simply selecting and optimizing the decoding paths.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: Language Modeling, Generation
Contribution Types: Model analysis & interpretability, Reproduction study, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 1793
Loading