You Only Train Once: Efficient Tokenizer Selection for Arithmetic in Language Models

You Only Train Once: Efficient Tokenizer Selection for Arithmetic in Language Models

ICML 2025 Workshop TokShop Submission37 Authors

Published: 10 Jun 2025, Last Modified: 11 Jun 2025TokShopEveryoneRevisionsBibTeXCC BY 4.0

Archiving Submission: No (non-archival)

Previous Venue If Non Archival: Under review at NeurIPS 2025

Keywords: Tokenizer Selection, Model-Input Co-optimization, Tokenizer Design Space, Efficient Pretraining, Arithmetic Reasoning

TL;DR: To overcome costly tokenizer searches for LLMs, especially for numerical tasks, we co-optimize the tokenizer and model in a single training phase, effectively identifying high-performing number tokenizers for arithmetic reasoning.

Abstract: Tokenization fundamentally shapes how language models perceive and process input, with substantial downstream effects---especially in tasks requiring symbolic or numerical precision. Yet, selecting an optimal tokenizer from a vast design space remains computationally prohibitive, typically requiring full-scale model training for each candidate. Focusing on arithmetic reasoning, we propose You Only Train Once (YOTO), a unified training framework that jointly optimizes the language model and a parameterized distribution over candidate tokenizers. By training a single model using a merged vocabulary and sampling tokenizations adaptively, YOTO enables efficient co-adaptation between model and tokenizer. Applied to arithmetic tasks, YOTO discovers high-performing number tokenizers while dramatically reducing evaluation cost. Our results highlight a promising path toward jointly optimizing tokenizers and models in a principled, scalable manner.

Submission Number: 37

Loading