CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics

Junqi Liu; Xiaohan Lin; Bolton Bailey; Jonas Bayer; Yaël Dillies; Weijie Jiang; Xiaodan Liang; Roman Soletskyi; Haiming Wang; Yunzhou Xie; Beibei Xiong; Zhengfeng Yang; Jujian Zhang; Lihong Zhi; Zekai Zhu; Jia LI; Zhengying Liu

CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics

Junqi Liu, Xiaohan Lin, Bolton Bailey, Jonas Bayer, Yaël Dillies, Weijie Jiang, Xiaodan Liang, Roman Soletskyi, Haiming Wang, Yunzhou Xie, Beibei Xiong, Zhengfeng Yang, Jujian Zhang, Lihong Zhi, Zekai Zhu, Jia LI, Zhengying Liu

18 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: benchmark, combinatoric, neural theorem proving, formal methods, large language models, AI for Math, formal reasoning

Abstract: Neurosymbolic approaches that integrate large language models with formal reasoning have recently achieved human-level performance on mathematics competition problems in algebra, geometry, and number theory. In comparison, combinatorics remains a challenging domain, characterised by a lack of appropriate benchmarks and theorem libraries. To address this gap, we introduce CombiBench, a comprehensive benchmark comprising 100 combinatorial competition problems, each formalized in Lean 4 and paired with its corresponding informal statement. The problems cover a wide spectrum of difficulty levels, ranging from middle school to IMO and university level, and span over ten combinatorial topics. Furthermore, we provide a comprehensive and standardized evaluation framework for formal mathematics. It accommodates not only proof-based problems but also, for the first time, the evaluation of fill-in-the-blank questions. We open source the benchmark dataset alongside with the code of the proposed evaluation method.

Primary Area: datasets and benchmarks

Submission Number: 11422

Loading