What Matters When Building Universal Multilingual Named Entity Recognition Models?

ACL ARR 2026 May Submission15084 Authors

26 May 2026 (modified: 18 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multilingual NER, Empirical evaluation
Abstract: Recent progress in universal multilingual named entity recognition (NER) has been driven by multilingual transformer models, task-specific architectures, custom loss functions, and large-scale training datasets. However, despite substantial prior work, we find that many critical design decisions for such models are made without systematic justification, with individual components evaluated only in combination rather than in isolation. We argue that this lack of rigor impedes progress in the field by making it difficult to identify which choices improve multilingual generalization. In this work, we conduct extensive experiments on transformer backbones, architectures, training objectives, data composition, and threshold selection. Building on these findings, we present Otter, a universal multilingual NER model supporting over 100 languages. Otter achieves consistent improvements over strong multilingual NER baselines, outperforming similarly sized models by 5.3 percentage points in F1 and achieving competitive performance compared to 90×larger generative models, while being substantially more efficient. We release model checkpoints, training, and evaluation code to facilitate reproducibility and future research.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: cross-lingual transfer
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: Over 150 languages (please refer to each evaluation and training dataset for comprehensive lists respectively)
EMNLP 2026 AI Reviewing Experiment: yes
Submission Number: 15084
Loading