Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models

Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models

ACL ARR 2024 June Submission639 Authors

12 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The disconnect between tokenizer creation and model training in language models has been known to allow for certain inputs, such as the infamous SolidGoldMagikarp token, to induce unwanted behaviour. Although such `glitch tokens' that are present in the tokenizer vocabulary, but are nearly or fully absent in training, have been observed across a variety of different models, a consistent way of identifying them has been missing. We present a comprehensive analysis of Large Language Model tokenizers, specifically targeting this issue of detecting under-trained tokens. Through a combination of tokenizer analysis, model weight-based indicators, and prompting techniques, we develop effective methods for automatically detecting these problematic tokens. Our findings demonstrate the prevalence of such tokens across various models and provide insights into improving the efficiency and safety of language models.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: generative models, word embeddings, subword representations

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models

Languages Studied: English, Japanese, Chinese, various others more incidentally

Submission Number: 639

Loading