Evaluating large language-vision models on geographic language understanding

ACL ARR 2024 June Submission3164 Authors

15 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Geographic language understanding (GLU) tasks ask models to map from text to maps. Geographical complex description parsing (GCDP) is a GLU task where models must assign sets of map coordinates to text that goes beyond a single named location, such as "...between the towns of Adrano and S. Maria di Licodia, 32 kilometres northwest of Catania". In GCDP, the input is both a text and a set of reference geometries for known places in the text (e.g., Adrano, S. Maria di Licodia, Catania), and the output is the geometry of the location described. In this paper, we convert a GCDP corpus into an image + text $\rightarrow$ image benchmark to evaluate recent large langugage-vision models on such complex task. The models show weak performance, with analysis showing a lack of understanding of even simpler tasks like recognizing regions by color.
Paper Type: Short
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: geoparsing, language understanding, multimodality
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 3164
Loading