Portuguese-IFEval: Instruction-Following Evaluation in Portuguese

Portuguese-IFEval: Instruction-Following Evaluation in Portuguese

ACL ARR 2026 January Submission7848 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Instruction-Following, Large Language Models, Portuguese NLP, Multilingual Benchmark, Model Evaluation, Cross-Lingual Analysis

Abstract: Instruction-following benchmarks have become central to evaluating large language models, yet their multilingual extensions remain largely translation-based. This design choice limits the ability of current evaluations to capture language-specific pragmatic, morphosyntactic, and orthographic constraints. We introduce \textbf{IFEval-PT}, a Portuguese extension of IFEval constructed through semantic regionalization rather than literal translation. The benchmark comprises 130 Portuguese prompts with one to three verifiable instructions, combining adapted and Portuguese-specific constraints. Evaluating proprietary, open-source, and Portuguese-tuned models under a unified protocol, we demonstrate systematic performance degradation on Portuguese-specific instructions relative to translated benchmarks. These results establish that translation alone is insufficient for faithful multilingual instruction-following evaluation and that language-aware benchmark design is essential to expose hidden failure modes. We will publicly release the benchmark, including all prompts and evaluation code, to support reproducibility and further research.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Multilingualism and Cross-Lingual NLP, Resources and Evaluation

Contribution Types: Data resources

Languages Studied: Portuguese, Spanish, French, Japanese, English

Submission Number: 7848

Loading