GlitchMiner: Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization

ACL ARR 2025 May Submission2127 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Glitch tokens in Large Language Models (LLMs) are rare yet critical anomalies that can trigger unpredictable and erroneous model behaviors, undermining reliability and safety. Existing detection methods predominantly rely on predefined embedding or activation patterns, limiting their generalizability across diverse architectures and potentially missing novel glitch manifestations. We propose GlitchMiner, a gradient-based discrete optimization framework that identifies glitch tokens by maximizing prediction entropy to capture uncertainty, guided by a local search strategy for efficient token space exploration. Extensive evaluations on ten diverse LLM architectures demonstrate that GlitchMiner significantly outperforms state-of-the-art baselines in detection accuracy and efficiency. Our approach advances robust, architecture-agnostic glitch token detection, enhancing the security and trustworthiness of LLM-based applications in critical domains.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: Glitch Tokens, Gradient-based Discrete Optimization
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 2127
Loading