TopoEval: A Comprehensive Benchmark for Topological Reasoning in Foundation Models

ACL ARR 2026 January Submission10093 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Benchmark evaluation; Large language model evaluation; Topological reasoning; Multimodal large language models; AI for mathematics; Scaling laws
Abstract: Topological reasoning---the ability to identify structural invariants under continuous deformations---is a cornerstone of human visual cognition, yet it remains under-explored in modern AI systems. Existing benchmarks often treat topology superficially, lacking systematic coverage and depth. To bridge this gap, we introduce TopoEval, a meticulously curated topological reasoning benchmark comprising 400 high-quality problems adapted from real-world math competitions and professional textbooks. TopoEval features a rigorous hierarchical taxonomy, spanning 4 major topological branches subdivided into 12 fine-grained subfields, and is graded across 3 difficulty levels. Incorporating tasks that demand complex visual reasoning, our dataset presents a comprehensive challenge to foundation models. Through extensive experimentation, we systematically investigate the impact of model scaling, reasoning depth, and prompt engineering strategies on performance. Furthermore, detailed error analysis unveils the deficiencies of current models in topological reasoning, providing critical directions for developing systems with genuine visual understanding and reasoning capabilities.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking,vision question answering,Logical Reasoning,scaling
Contribution Types: Data resources
Languages Studied: English
Submission Number: 10093
Loading