TailNLG: A Multilingual Benchmark Addressing Verbalization of Long-Tail Entities

TailNLG: A Multilingual Benchmark Addressing Verbalization of Long-Tail Entities

ACL ARR 2026 January Submission6250 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Natural Language Generation, bias, long tail, Knowledge Graphs, RDF-to-text

Abstract: The automatic verbalization of structured knowledge is a key task for making knowledge graphs accessible to non-expert users and for supporting retrieval-augmented generation systems. Although recent advances in RDF-to-text generation have improved multilingual coverage, little attention has been paid to potential biases in the verbalization of rare entities, frequently known as long-tail entities. In this work, we present the first systematic study of long-tail entities in RDF-to-text generation. We introduce TailNLG, a new multilingual benchmark in English, Italian, and Spanish, built from Wikidata and covering entities with varying levels of popularity. We evaluate three different families of large language models in zero-shot settings and compare their performance on rare versus common entities, as well as against the established WebNLG benchmark. Our results reveal a consistent bias against long-tail entities: embedding-based scores are lower, and model uncertainty is higher for rare entities. We further show that the impact of long-tail entities varies across models and languages, and that existing evaluation metrics do not consistently capture these differences, highlighting the need for more reliable evaluation frameworks.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: corpus creation, multilingual corpora, NLP datasets, evaluation

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English, Italian, Spanish

Submission Number: 6250

Loading