## College-level Math Formalizaition Benchmark Dataset
The 2 datasets are served as a standard testsuite to meassure the capability of Lean 4 auto-formalization model on the undergraduate-level and graduate-level methematics.

- **Extract Theorem:** A custom dataset compiled by extracting theorems from advanced undergraduate-level textbooks using OCR on scanned materials. It covers a wide range of mathematical topics and includes multilingual content.

- **College CoT:** A curated dataset derived from digital mathematics resources across the internet, with content verified and filtered using a large language model (LLM) to ensure quality and relevance.