Latent Mechanisms of Code-Switching in Large Language Models

Published: 11 Jun 2026, Last Modified: 11 Jun 2026Mech Interp Workshop ICML 2026 VirtualposterEveryoneRevisionsCC BY 4.0
Keywords: Methods (probing, steering, causal interventions), Concept Discovery (e.g., SAEs, dictionary learning)
TL;DR: Comparison of language-controlling latent discovery mechanisms in code-switching settings
Abstract: Multilingual large language models can exhibit _unintended code-switching_ -- unnecessarily alternating between languages during generation. We present a comparative study of three methods that identify language-controlling latents in cross-layer transcoders: activation value-based selection (ValSel), activation frequency-based selection (FreqSel), and LLM-generated latent annotation-based selection (AnnSel). To evaluate the efficacy of these methods in identifying language-controlling latents, we introduce two multilingual code-switching benchmarks designed for fine-grained analysis of language steering across seven languages. Through targeted intervention experiments on Gemma-2-2B and Qwen3-4B, we find that all three methods effectively manipulate generation language, with FreqSel achieving the strongest overall performance, while AnnSel offering interpretable latent selection through explicit language annotations. We study the redundancy of language control representation in the latent space of the studied models by a knock-out analysis that suggests evidence of representation divergence.
Submission Number: 696
Loading