On the effectiveness of phrase distance measures on separability and cohesion of meaning: A multilingual reviewDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: This paper presents an automated method for evaluating phrase distancemeasures based on cohesion and diffusion measurements, eliminating theneed for direct human judgment. The evaluation involves five homegrown datasets,each consisting of 200 headlines or abstracts from news articles,subdivided into 20 sets. Two datasets are in Arabic, while others include news inFrench, German, and English. Each set contains 10 texts with sharedmeaning but different cohesion, and diffusion is modeled bydistances between articles with different meanings. The benchmark forevaluating phrase distance measures combines Silhouette Indexproperties with the mean of Pearson Correlations over distance matrixpairs. Our findings reveal that Yule distance with binary embeddingsconsistently surpasses other measures. Phrase distance performanceremains steady across languages, tokenizers and sentences' lengths.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: Arabic, English, French, German
0 Replies

Loading