N-gram Prediction and Word Difference Representations for Language Modeling

ACL ARR 2024 August Submission215 Authors

15 Aug 2024 (modified: 06 Sept 2024)ACL ARR 2024 August SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Causal language modeling (CLM) serves as the foundational framework underpinning remarkable successes of recent large language models (LLMs). Despite its success, the training approach for next word prediction poses a potential risk of causing the model to overly focus on local dependencies within a sentence. While prior studies have been introduced to predict future $N$ words simultaneously, they were primarily applied to tasks such as masked language modeling (MLM) and neural machine translation (NMT). In this study, we introduce a simple $N$-gram prediction framework for the CLM task. Moreover, we introduce word difference representation (WDR) as a surrogate and contextualized target representation during model training on the basis of $N$-gram prediction framework. To further enhance the quality of next word prediction, we propose an ensemble method that incorporates the future $N$ words' prediction results. Empirical evaluations across multiple benchmark datasets encompassing CLM and NMT tasks demonstrate the significant advantages of our proposed methods over the conventional CLM.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: Language Modeling, N-gram Prediction, Word Embedding Composition
Contribution Types: NLP engineering experiment
Languages Studied: English, German, Turkish
Submission Number: 215
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview