Keywords: Norm Extraction, Similarity Metric, Large Language Model
Abstract: To introduce human-created conventions specified in natural language into multi-agent systems (MASs), it is necessary to convert them into rules that agents can process and reason about. In this paper we introduce a quantifier-free higher-order formal language for representing these conventions. To facilitate the translation, we employ large language models (LLMs) to convert natural language conventions into this formal logic. However, assessing the quality of these automatically generated translations requires similarity metrics that capture semantic meaning, while faithfully reflecting differences in logical structure. Existing metrics used in natural language to first-order logic (NL-FOL) translation tasks, such as BLEU and Logical Equivalence, prove insufficient for this purpose. They usually lack sensitivity to semantic similarities between conceptually related predicates and fail to account for structural differences between formulae. This paper proposes a novel similarity metric designed specifically to address these limitations by accounting for similarity across both semantic and structural dimensions. We test our proposed metric and compare it to BLUE and Logical Equivalence on the NL-FOL pairs from the MALLS dataset and show that it provides more comprehensive and reliable similarity assessments.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Evaluation methodologies, Metrics, Statistical Testing for Evaluation
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 2315
Loading