CombiGraph-Vis: A Curated Multimodal Olympiad Benchmark for Discrete Mathematical Reasoning

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Discrete Math Benchmark, Combinatorics Benchmark
TL;DR: We have created a new verifiable discrete mathematics benchmark
Abstract: CombiGraph-Vis is a 1,135-problem benchmark for discrete mathematical reasoning spanning 13 domains and three formats (short-answer, multiple-choice, and yes/no). Notably, 35\% of problems include images whose structure is essential for finding solutions. Each problem comes with a verified solution and technique labels, with the entire dataset curated and validated through agentic workflows under human oversight to ensure consistency and fidelity. Evaluations across diverse model families reveal a wide performance range (16\%--78\% accuracy), with particularly sharp drops on image-based problems. For standalone multiple-choice problems, clear gaps emerge between correct-answer accuracy and among-choices accuracy, indicating vulnerability to trap choices. The benchmark emphasizes reasoning over graphs, grids, and other combinatorial objects. We release the dataset, solutions, technique labels, and evaluation code to support research on robust multimodal discrete-math reasoning.
Submission Number: 180
Loading