gpt-4-0613
math_word_problem_generation   &   140 &   252 &   151 & 55.0 \\
finegrained_fact_verification  &   140 &   521 &    82 & 55.7 \\
answerability_classification   &   140 &   119 &    75 & 57.9 \\

meta-llama/Llama-2-70b-chat-hf
math_word_problem_generation   &   160 &   235 &   163 & 75.0 \\
finegrained_fact_verification  &   160 &   509 &   168 & 78.8 \\
answerability_classification   &   160 &   118 &    97 & 77.5 \\

