Archiving Submission: No (non-archival)
Previous Venue If Non Archival: Under review at NeurIPS 2025
Keywords: Tokenizer Selection, Model-Input Co-optimization, Tokenizer Design Space, Efficient Pretraining, Arithmetic Reasoning
TL;DR: To overcome costly tokenizer searches for LLMs, especially for numerical tasks, we co-optimize the tokenizer and model in a single training phase, effectively identifying high-performing number tokenizers for arithmetic reasoning.
Abstract: Tokenization fundamentally shapes how language models perceive and process input, with substantial downstream effects---especially in tasks requiring symbolic or numerical precision. Yet, selecting an optimal tokenizer from a vast design space remains computationally prohibitive, typically requiring full-scale model training for each candidate. Focusing on arithmetic reasoning, we propose You Only Train Once (YOTO), a unified training framework that jointly optimizes the language model and a parameterized distribution over candidate tokenizers. By training a single model using a merged vocabulary and sampling tokenizations adaptively, YOTO enables efficient co-adaptation between model and tokenizer. Applied to arithmetic tasks, YOTO discovers high-performing number tokenizers while dramatically reducing evaluation cost. Our results highlight a promising path toward jointly optimizing tokenizers and models in a principled, scalable manner.
Submission Number: 37
Loading