everyone
since 07 Dec 2024">EveryoneRevisionsBibTeXCC BY 4.0
Dear Action Editor,
We have now submitted the camera-ready version of our paper. We have now added a new co-author as discussed before.
We addressed all the feedback in your comments. Here is a list of changes in our final revision:
We extended our evaluation to 6 new models as presented in Tables 5, 6. and E4. As you had requested, we evaluated the accuracy of Qwen Math 2.5 and Llama 3.1 on our dataset (Table E4). We also evaluated the accuracy of Goedel Prover, DeepSeek Prover 1.5, and ReProver. Goedel Prover was released this February with SOTA accuracy on miniF2F. DeepSeek Prover was the previous SOTA model until this February and turns out to have the best accuracy (39%) on our dataset. Additionally, we updated our experiments from o1-mini to o3-mini. We also experimented with ReProver which is an established model for automated theorem proving with an open-source training set.
We have made revisions in our introduction to reflect our discussions with the reviewers and to explain how we suggest our dataset to be used.
We have added additional explanations to examine the length of the proofs that these LLMs can correctly prove.
Thank you again for your helpful feedback.
Sincerely, The Authors