Patient Visualization Enhances Spatial Reasoning in GPT Models

ACL ARR 2024 August Submission439 Authors

16 Aug 2024 (modified: 21 Sept 2024)ACL ARR 2024 August SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: While large language models (LLMs) are dominating the field of natural language processing, with GPT being one of the leaders, it remains an open question how well these models can perform spatial reasoning. Contrary to recent studies suggesting that LLMs struggle with spatial reasoning tasks, we demonstrate in this paper that a novel prompting technique, termed Patient Visualization of Thought (Patient-VoT), can boost GPTs' spatial reasoning abilities. The core idea behind Patient-VoT is to tackle (1) spatial understanding and (2) spatial reasoning, each through a two-step approach, where each process is guided by key trigger words: bullet list and coordinate, respectively. By applying Patient-VoT, we achieve an average accuracy improvement of up to 35% (absolute) compared to the state-of-the-art visual prompting technique, Visualization-of-Thought. Our findings show that GPTs are indeed much more proficient in spatial tasks than commonly believed, when effectively prompted.
Paper Type: Short
Research Area: Language Modeling
Research Area Keywords: Prompting
Contribution Types: Position papers
Languages Studied: English
Submission Number: 439
Loading