Training and Evaluation of Word Embedding Models for Azerbaijani Language

Kamran Huseynov, Umid Suleymanov, Samir Rustamov, Javid Huseynov

Published: 2020, Last Modified: 27 Jan 2026MIDI 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recently, natural language representation models have attracted an increasing amount of attention from researchers. Various approaches have been proposed for learning these continuous vector representations. In this work, we will analyze the effectiveness of various word embeddings learning approaches for Azerbaijani language. Mainly, we will concentrate on two methodologies: (1) word2vec and (2) GloVe. We have trained both models on the text corpus of cleaned, Azerbaijani news articles and parsed books. Moreover, we have created intrinsic analogy tasks as introduced by Mikolov et al. for Azerbaijani. For the evaluation of word vector models in Azerbaijani, the intrinsic analogy tasks, as well as, two separate extrinsic evaluation tasks are performed. This work is one of the initial reports on the evaluation of word embeddings on intrinsic as well as extrinsic evaluation tasks for Azerbaijani, which is a low resource, agglutinative language.

External IDs:dblp:conf/midi/HuseynovSRH20