Inspecting Soundness of AMR Similarity Metrics in terms of Equivalence and Inequivalence

Kyung Seo Ki, Bugeun Kim, Gahgene Gweon

Published: 2024, Last Modified: 07 Nov 2025*SEM@NAACL 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this study, we investigate soundness of current Abstract Meaning Representation (AMR) similarity metrics in terms of equivalence and inequivalence. Specifically, AMR guidelines provide several equivalence and inequivalence conditions to reflect the meaning aspect of the semantics. Thus, it is important to examine an AMR metric’s soundness, i.e., whether the metric correctly reflects the guidelines. However, the existing metrics have less investigated their soundness. In this work, we propose a new experimental method using simulated data and a series of statistical tests to verify the metric’s soundness. Our experimental result revealed that all existing metrics such as Smatch, SemBLEU, S2match, Smatch++, WWLK-theta, WWLK-k3e2n, and SEMA did not fully meet the AMR guidelines in terms of equivalence and inequivalence aspects. Also, to alleviate this soundness problem, we suggest a revised metric called Smatch#, which adopts simple graph standardization technique that can improve the soundness of an existing metric.