Large Vocabulary Size Improves Large Language Models

ACL ARR 2024 August Submission385 Authors

16 Aug 2024 (modified: 13 Sept 2024)ACL ARR 2024 August SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper empirically investigates the relationship between subword vocabulary size and the performance of large language models (LLMs) to provide insights on how to define the vocabulary size. Experimental results show that larger vocabulary sizes lead to better performance in LLMs. Moreover, we consider a continual training scenario where a pre-trained language model is trained on a different target language. We introduce a simple method to use a new vocabulary instead of the pre-defined one. We show that using the new vocabulary outperforms the model with the vocabulary used in pre-training.
Paper Type: Short
Research Area: Language Modeling
Research Area Keywords: pre-training, continual learning
Contribution Types: NLP engineering experiment
Languages Studied: English, Japanese
Submission Number: 385
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview