Evaluating large language-vision models on geographic language understanding

Evaluating large language-vision models on geographic language understanding

ACL ARR 2024 June Submission3164 Authors

15 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Geographic language understanding (GLU) tasks ask models to map from text to maps. Geographical complex description parsing (GCDP) is a GLU task where models must assign sets of map coordinates to text that goes beyond a single named location, such as "...between the towns of Adrano and S. Maria di Licodia, 32 kilometres northwest of Catania". In GCDP, the input is both a text and a set of reference geometries for known places in the text (e.g., Adrano, S. Maria di Licodia, Catania), and the output is the geometry of the location described. In this paper, we convert a GCDP corpus into an image + text $\rightarrow$ image benchmark to evaluate recent large langugage-vision models on such complex task. The models show weak performance, with analysis showing a lack of understanding of even simpler tasks like recognizing regions by color.

Paper Type: Short

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: geoparsing, language understanding, multimodality

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 3164

Loading