Keywords: Multilingualism and Cross-Lingual NLP, Resources and Evaluation,Semantic Parsing,Language Modeling
Abstract: Translating natural language to First-Order Logic (NL2FOL) is crucial for logic-based reasoning, but most existing models and benchmarks focus on English, leaving cross-lingual generalization underexplored. In this study, we present the first comprehensive investigation of cross-lingual robustness in NL2FOL. To support this, we introduce Multi-MALLS, the first multilingual benchmark for NL2FOL, which extends the widely-used MALLS dataset with high-quality translations into multiple languages. Using Multi-MALLS, we observe that state-of-the-art models experience a significant performance drop when applied to non-English inputs, highlighting the lack of robustness. To address this, we propose MA-FOL, a multi-agent framework that improves generalization without requiring any training on target languages. By decomposing the task into three modules, a language-agnostic structure generator, a language-specific predicate combiner, and a refinement component, MA-FOL achieves robust zero-shot generalization across diverse linguistic inputs. Additionally, we show that traditional evaluation metrics, such as Exact Match, often fail to assess semantic correctness. To remedy this, we introduce large language models (LLMs)-Judged Semantic Equivalence (SE), a new metric that leverages LLMs to evaluate whether generated formulas preserve the intended meaning. Extensive experiments demonstrate that MA-FOL outperforms strong baselines on Multi-MALLS, without any multilingual fine-tuning. The SE metric further reveals semantic correctness that traditional metrics miss. Overall, our work provides a benchmark to test robustness of NL2FOL, a framework to improve it, and a metric to evaluate it more effectively.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: Multilingualism and Cross-Lingual NLP, Resources and Evaluation,Semantic Parsing,Language Modeling
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data resources
Languages Studied: English,Chinese,Japanese,Russian,French,German
Submission Number: 5754
Loading