MathGenBench: Benchmarking Detection of Machine-Generated Mathematical Text

MathGenBench: Benchmarking Detection of Machine-Generated Mathematical Text

ACL ARR 2025 May Submission2951 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The rapid advancement of large language models (LLMs) has heightened concerns about their misuse in generating deceptive mathematical content. To address the lack of specialized benchmarks for machine-generated mathematical text detection, we introduce MathGenBench, the first comprehensive benchmark targeting machine-generated mathematical text. Our benchmark integrates authentic human-written content from arXiv, Mathematics Stack Exchange (MSE), and Wikipedia with machine-generated samples produced by 10 leading language models. To simulate real-world adversarial scenarios, we employ various text manipulation strategies, including paraphrase attackse attacks and perturbation attacks. Building upon the TOCSIN framework, we propose TOCSIN*, which enhances detection robustness through a learnable linear aggregation mechanism for token cohesiveness and zero-shot scores. Extensive experiments demonstrate TOCSIN*'s superiority over existing methods across different scenarios. This work provides critical tools for combating machine-generated mathematical text.

Paper Type: Long

Research Area: Generation

Research Area Keywords: benchmarking; misinformation detection; adversarial attacks

Contribution Types: Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Submission Number: 2951

Loading