A Similarity Metric for Evaluating Natural Language to Logic Translation

A Similarity Metric for Evaluating Natural Language to Logic Translation

ACL ARR 2026 January Submission2315 Authors

02 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Norm Extraction, Similarity Metric, Large Language Model

Abstract: To introduce human-created conventions specified in natural language into multi-agent systems (MASs), it is necessary to convert them into rules that agents can process and reason about. In this paper we introduce a quantifier-free higher-order formal language for representing these conventions. To facilitate the translation, we employ large language models (LLMs) to convert natural language conventions into this formal logic. However, assessing the quality of these automatically generated translations requires similarity metrics that capture semantic meaning, while faithfully reflecting differences in logical structure. Existing metrics used in natural language to first-order logic (NL-FOL) translation tasks, such as BLEU and Logical Equivalence, prove insufficient for this purpose. They usually lack sensitivity to semantic similarities between conceptually related predicates and fail to account for structural differences between formulae. This paper proposes a novel similarity metric designed specifically to address these limitations by accounting for similarity across both semantic and structural dimensions. We test our proposed metric and compare it to BLUE and Logical Equivalence on the NL-FOL pairs from the MALLS dataset and show that it provides more comprehensive and reliable similarity assessments.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Evaluation methodologies, Metrics, Statistical Testing for Evaluation

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 2315

Loading