Code-Mixed Phonetic Perturbations for Red-Teaming LLMs

Code-Mixed Phonetic Perturbations for Red-Teaming LLMs

ACL ARR 2026 January Submission9665 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Code-mixing, LLM, Jailbreaking

Abstract: Large language models (LLMs) continue to be demonstrably unsafe despite sophisticated safety alignment techniques and multilingual red-teaming. However, recent red-teaming work has focused on incremental gains in attack success over identifying underlying architectural vulnerabilities in models. In this work, we present \textbf{CMP-RT}, a novel red-teaming probe that combines code-mixing with phonetic perturbuations (CMP), exposing a tokenizer-level safety vulnerability in transformers. Combining realistic elements from digital communication such as code-mixing and textese, CMP-RT preserves phonetics while perturbing safety-critical tokens, allowing harmful prompts to bypass alignment mechanisms while maintaining high prompt interpretability, exposing a gap between pre-training and safety alignment. Our results demonstrate robustness gainst standard defenses, attack scalability, and generalisation of the vulnerability across modalities and to SOTA models like Gemini-3-Pro, establishing CMP-RT as a major threat model, and highlighting tokenization as an under-examined vulnerability in current safety pipelines.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: red teaming, jailbreaking, multilingual, multimodal, LLM safety

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models

Languages Studied: English, Hindi

Submission Number: 9665

Loading