Grounding spatial relations in text-only language models

Published: 01 Jan 2024, Last Modified: 16 May 2025Neural Networks 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A novel textual representation for complex scenes based on location tokens.•Location tokens allow Language Models to ground spatial relations between objects.•Using an automatic synthetic dataset we train Language Models for spatial grounding.•The learned grounding mechanisms transfer to the Visual Spatial Reasoning dataset.•An extensive analysis shows the importance of location tokens and spatial training.
Loading