Portuguese-IFEval: Instruction-Following Evaluation in Portuguese

ACL ARR 2026 January Submission7848 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Instruction-Following, Large Language Models, Portuguese NLP, Multilingual Benchmark, Model Evaluation, Cross-Lingual Analysis
Abstract: Instruction-following benchmarks have become central to evaluating large language models, yet their multilingual extensions remain largely translation-based. This design choice limits the ability of current evaluations to capture language-specific pragmatic, morphosyntactic, and orthographic constraints. We introduce \textbf{IFEval-PT}, a Portuguese extension of IFEval constructed through semantic regionalization rather than literal translation. The benchmark comprises 130 Portuguese prompts with one to three verifiable instructions, combining adapted and Portuguese-specific constraints. Evaluating proprietary, open-source, and Portuguese-tuned models under a unified protocol, we demonstrate systematic performance degradation on Portuguese-specific instructions relative to translated benchmarks. These results establish that translation alone is insufficient for faithful multilingual instruction-following evaluation and that language-aware benchmark design is essential to expose hidden failure modes. We will publicly release the benchmark, including all prompts and evaluation code, to support reproducibility and further research.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Multilingualism and Cross-Lingual NLP, Resources and Evaluation
Contribution Types: Data resources
Languages Studied: Portuguese, Spanish, French, Japanese, English
Submission Number: 7848
Loading