Estimating Semantics Distance of Texts Based on Used Terms Analysis

Krystian Wojtkiewicz, Michal Kawa

Published: 2021, Last Modified: 02 Dec 2024ICCCI 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Every day we are facing countless texts on various topics. With a growing number of resources, it is getting harder to find valuable content. There is a need for automated tools to evaluate texts and propose new reads based on their similarity to the original one. The paper aims at introducing a method for calculating the semantic distance between two texts. We use well-known morphological tools to disambiguate the meaning and function of each word in the text. Next, we create the similarity matrixes utilizing the weight of WordNet synset relations. Each term can be part of one of three sets, which visualize three levels of semantic distance. While calculating the distance between texts, we consider statistical characteristics that partly use the identification of terms. However, the primary stress is put on the meaning each word brings to the utterance.