Where the Cat Sat: A Multilingual Benchmark for Spatial Language Understanding

ACL ARR 2026 January Submission3229 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: spatial language understanding, multilingual evaluation, typological diversity, morphological analysis, evaluation benchmarks
Abstract: Spatial language understanding is fundamental to human communication and reasoning, enabling tasks from robot navigation to document analysis and geographic information systems. Current spatial language understanding exhibits biases toward English and in particular prepositional marking. We present a novel framework for spatial language understanding and a multilingual benchmark decomposing spatial relations into surface elements (figure, ground, predicate, markers) and semantic components (dynamicity, stasis). Evaluating frontier Large Language Models (LLMs) on Spanish, Basque, and Chinese, we find high accuracy on surface element identification but persistent gaps in semantic classification. Basque case affixes remain most challenging—small models achieve as low as 15.3\% on spatial markers—suggesting morphological complexity poses difficulties even for large models. These results suggest that surface parsing does not entail spatial understanding, and that evaluation must include languages with diverse spatial marking strategies beyond prepositions.
Paper Type: Long
Research Area: Multilinguality and Language Diversity
Research Area Keywords: multilingual evaluation, multilingual benchmarks, linguistic theories, semantic parsing, evaluation methodologies, morphological analysis
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data resources, Data analysis
Languages Studied: Spanish, Chinese, Basque
Submission Number: 3229
Loading