Inspecting Soundness of AMR Similarity Metrics in terms of Equivalence and Inequivalence

Published: 01 Jan 2024, Last Modified: 12 Dec 2024*SEM@NAACL 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this study, we investigate soundness of current Abstract Meaning Representation (AMR) similarity metrics in terms of equivalence and inequivalence. Specifically, AMR guidelines provide several equivalence and inequivalence conditions to reflect the meaning aspect of the semantics. Thus, it is important to examine an AMR metric’s soundness, i.e., whether the metric correctly reflects the guidelines. However, the existing metrics have less investigated their soundness. In this work, we propose a new experimental method using simulated data and a series of statistical tests to verify the metric’s soundness. Our experimental result revealed that all existing metrics such as Smatch, SemBLEU, S2match, Smatch++, WWLK-theta, WWLK-k3e2n, and SEMA did not fully meet the AMR guidelines in terms of equivalence and inequivalence aspects. Also, to alleviate this soundness problem, we suggest a revised metric called Smatch#, which adopts simple graph standardization technique that can improve the soundness of an existing metric.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview