SCANS: An Efficient Geometric Problem Solver with Content-Aware Attention and Adaptive Fusion

Published: 2025, Last Modified: 21 Jan 2026ICDAR (Workshops 1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Geometric problem solving (GPS) involves inferring unknown quantities or relations from a diagram and its related description. It is a visual reasoning task that is challenged by two difficulties. One is to first locate the key geometric elements in the diagram, and then merge them with the relevant description. Existing neural solvers often fail to exploit the inherent structural information in geometric diagrams, since they do not distinguish between content-rich and blank regions during feature extraction. Meanwhile, these methods typically neglect the varying importance of features in the multi-modal fusion process. We propose a salient content-aware neural solver (SCANS) to address the above two challenges. Specifically, a content-aware structural attention module first constructs a structural prior, which enables it to focus on the informative regions in the diagram while suppressing the empty background. Then, a hierarchical adaptive fusion module is applied to merge visual and textual features across spatial, channel, and modality dimensions and learn task-specific fusion weights. Such a design facilitates fine-grained perception of geometric relations and balances the contributions from two modalities. Extensive experiments on the Geometry3K and PGPS9K datasets validate the effectiveness of the proposed SCANS.
Loading