Abstract: We have presented a corpus of Uzbek news articles containing manually annotated named entities. The corpus comprises 500 articles (222,536 tokens) and three entity classes (person, location, organization) sourced from Qalampir, an online news source in Uzbekistan. This corpus can be used for develop and evaluate natural language processing (NLP) models for Uzbek. We conducted a baseline experiment on the qalampir corpus using pre-trained models. The results showed that the pre-trained model CINO outperformed other multilingual models.
External IDs:dblp:journals/lre/YusufuAYALJ25
Loading