Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0Download PDF

Published: 09 Apr 2022, Last Modified: 22 Oct 2023BigScience#5Readers: Everyone
Keywords: NER, historical texts, zero-shot, transfer learning, prompt-based, multilingual, BigScience
TL;DR: We test zero-shot NER on historical texts using the T0 language model and we also test it on language and date recognition.
Abstract: In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive approach for prompt-based zero-shot multilingual Named Entity Recognition is error-prone, but highlights the potential of such an approach for historical languages lacking labeled datasets. Moreover, we also find that T0-like models can be probed to predict the publication date and language of a document, which could be very relevant for the study of historical texts.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:2204.05211/code)
1 Reply

Loading