Keywords: Correlation Analysis, Morphological Evaluation Metrics, Semantic Evaluation Metrics, Machine Translation
TL;DR: Correlation Analysis
Abstract: Machine translation evaluation methods can be roughly divided into three categories: manual evaluation, classical morphological evaluation and semantic evaluation based on pre-trained model. The automatic evaluation metrics of the latter two categories are numerous, from which we select commonly used seven morphological evaluation metrics and four semantic evaluation metrics for correlation analysis between each two of them. The experimental results of the correlation coefficients of Pearson, Kendall and Spearman on 40 machine translation models of bidirectional 20 foreign languages and Chinese show that: (1) There is an extremely strong correlation among morphological evaluation metrics, indicating that the statistical results of various morphological calculation methods tend to be the same on big data. (2) There is a strong correlation between semantic evaluation metrics, indicating that although there are semantic spatial differences among various pre-trained models, the statistical results on big data also tend to be consistent. The above-mentioned ubiquitous correlations largely stem from the equivalence of human cognition and the economy of knowledge representation. (3) There is also a strong correlation between morphological and semantic evaluation metrics, which shows that the deep “semantics” of various commercial hypes at present is just another high-level “morphology”. Because the Turing computing system can use symbols and operations to directly represent and accurately process morphologies, but can only simulately represent and approximately process semantics using symbols and operations. (4) For each correlation coefficient between any two evaluation metrics, there is a significant difference between different languages, which indicates that morphology and semantics are inherent attributes of languages, and more optimized evaluation metrics of machine translation should be personalized according to the language.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Submission Number: 4399
Loading