AntiderivBench: Evaluating language models on indefinite integration

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, reasoning, integration, benchmark
TL;DR: We construct a benchmark with challenging indefinite integration problems and evaluate LLMs on it.
Abstract: We present AntiderivBench: a benchmark consisting of integration problems extracted from the challenging annual MIT Integration Bee competition. A number of frontier, closed models as well as smaller, open-source models are evaluated on it. Additionally, we create more challenging versions of the benchmark by symbolically manipulating the original competition problems. We envision that the benchmark will be useful for evaluating reasoning capabilities of LLMs and for experimenting with post-training LLM pipelines depending on verifiable rewards.
Supplementary Material: zip
Submission Number: 127
Loading