Silent Tokens, Loud Effects: Padding in LLMs

Rom Himelstein; Amit LeVi; Yonatan Belinkov; Avi Mendelson

Silent Tokens, Loud Effects: Padding in LLMs

Rom Himelstein, Amit LeVi, Yonatan Belinkov, Avi Mendelson

Published: 24 Sept 2025, Last Modified: 24 Sept 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models (LLMs), Padding, Robustness, Reliability, Activations, Generation, Bias, Safety

TL;DR: Padding tokens, often assumed harmless in LLMs, can distort activations, harm generation quality, shift bias, and weaken safety when mishandled. Our study across Llama, Gemma, and Qwen shows padding is a real robustness risk that must be addressed.

Abstract: Padding tokens are widely used in large language models (LLMs) to equalize sequence lengths during batched inference. While they should be fully masked, implementation errors can cause them to influence computation, and the extent of this influence is not well understood. We systematically study this effect across three open-source model families (Llama, Gemma, Qwen), inserting controlled amounts of padding and evaluating outcomes along four axes: activations, generation quality, bias, and safety. Even small amounts of padding shift hidden representations, degrade quality in smaller models, alter bias in unpredictable ways, and weaken safety guardrails. These findings demonstrate that padding is not a harmless detail but a robustness risk that must be carefully handled in deployment. A reference implementation is available at https://anonymous.4open.science/r/silent_tokens_loud_effects-A851.

Submission Number: 67

Loading