Linguistic Resources and Topic Models for the Analysis of Persian Poems
Abstract: This paper describes the usage of Natural Language Processing tools, mostly probabilistic
topic modeling, to study semantics (word correlations) in a collection of Persian poems consisting of roughly 18k poems from 30 different poets. For this study, we put a lot of effort in the preprocessing and the development
of a large scope lexicon supporting both modern and ancient Persian. In the analysis step,
we obtained very interesting and meaningful
results regarding the correlation between poets and topics, their evolution through time,
as well as the correlation between the topics
and the metre used in the poems. This work
should thus provide valuable results to literature researchers, especially for those working
on stylistics or comparative literature.
0 Replies
Loading