LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety

LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety

ACL ARR 2026 January Submission10076 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Safety Alignment, Semantic Alignment, Multilingual LLM Safety

Abstract: Large language models (LLMs) have demonstrated better safety performance in high-resource languages than in low-resource languages. We attribute this issue as a mismatch gap between language-agnostic semantic understanding ability and language dominant safety alignment biased toward high-resource languages. Based on above insights, we empirically identify the semantic bottleneck in LLMs: intermediate layers in which the geometry of model representations is governed primarily by shared semantic content rather than language identity. Then, we propose Language-Agnostic Semantic Alignment (LASA), which anchors safety alignment directly in semantic bottlenecks. Experiments show that LASA substantially improves safety across all languages: average attack success rate (ASR) drops from 24.7\% to 2.8\% on LLaMA-3.1-8B-Instruct and remains within 3–4\% across Qwen2.5 and Qwen3 Instruct models (7B–32B). Besides, our analysis and method offer a representation-level perspective on LLM safety, suggesting that safety alignment requires anchoring safety understanding not in surface text, but in the model's language-agnostic semantic space.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: Language Modeling, Ethics, Bias, and Fairness

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English, Chinese, Italian, Vietnamese, Arabic, Korean, Thai, Bengali, Swahili, Javanese

Submission Number: 10076

Loading