Safety in Spanish: A Cross-Lingual Evaluation of Jailbreak Vulnerability in LLMs

Published: 06 May 2026, Last Modified: 06 May 2026PreprintEveryoneCC BY 4.0
Abstract: Large language models (LLMs) are widely used in multilingual settings, yet most safety evaluations focus on English. In this work, we present a cross-lingual study of jailbreak vulnerability with a focus on Spanish. We construct a benchmark of 1,107 prompts across English, Spanish, and code-switched variants, covering multiple attack strategies and harm categories, along with a benign baseline. We evaluate several open-source and proprietary models using attack success rate (ASR) and automated safety judgment. Most models show similar vulnerability across languages, but Llama 3 8B and Qwen 2.5 7B exhibit higher ASR in Spanish, while Mistral 7B remains highly vulnerable across all conditions. We also find that direct prompts outperform role-play and hypothetical strategies, and that code-switching reduces attack effectiveness. These results highlight that multilingual safety is uneven across models and that language can influence jailbreak behavior in non-trivial ways.
Loading