Gestalt Vision: A Dataset for Evaluating Gestalt Principles in Visual Perception

Published: 20 Apr 2025, Last Modified: 29 Aug 2025NeSy 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Logic Scene Reasoning, FOL, neuroal-symbolic AI, benchmark
TL;DR: Gestalt Vision is a benchmark that tests AI models on recognizing and reasoning about structured visual patterns using Gestalt principles. It highlights current limitations and the need for better perceptual mechanisms in AI reasoning.
Track: Main Track
Abstract: Gestalt principles, established in the 1920s, describe how humans perceive individual elements as cohesive wholes. These principles, including proximity, similarity, closure, continuity, and symmetry, play a fundamental role in human perception, enabling structured visual interpretation. Despite their significance, existing AI benchmarks fail to assess models' ability to infer patterns at the group level, where multiple objects following the same Gestalt principle are considered as a group using these principles. To address this gap, we introduce Gestalt Vision, a diagnostic framework designed to evaluate AI models' ability to not only identify groups within patterns but also reason about the underlying logical rules governing these patterns. Gestalt Vision provides structured visual tasks and baseline evaluations spanning neural, symbolic, and neural-symbolic approaches, uncovering key limitations in current models' ability to perform human-like visual cognition. Our findings emphasize the necessity of incorporating richer perceptual mechanisms into AI reasoning frameworks. By bridging the gap between human perception and computational models, Gestalt Vision offers a crucial step toward developing AI systems with improved perceptual organization and visual reasoning capabilities.
Paper Type: Long Paper
Software: https://github.com/ml-research/ELVIS
Submission Number: 42
Loading