
Model,MMLU-Pro,MMLU,TriviaQA,GPQA-D,SimpleQA,BBH,AGIEval-en,GSM8K,MATH,MBPP,HumanEval
Claude 3 Haiku,0.4229,0.75,,,,,,0.89,,,0.76
Claude 3 Sonnet,0.568,0.79,,,,,,0.92,,,0.73
Claude 3 Opus,0.6845,0.87,,,,,,0.95,,,0.85
Claude 3.5 Sonnet,0.7764,0.98,0.926,0.755,0.284,,,0.96,,,0.94
Command R+,0.379,0.76,,,,,,0.71,,,
Gemini 1.0 Pro,,0.72,,,,,,,,,
Gemini 1.5 Flash,0.6409,0.79,,,,,,0.86,,,0.74
Gemini 1.5 Pro,0.7025,0.86,,,,,,0.91,,,0.84
GPT-3.5 Turbo,,0.7,,,,,,,,,0.68
GPT-4 Turbo,0.6371,0.86,,,,,,,0.73,,0.87
GPT-4 (1106),,0.836,0.731,0.404,,0.827,,0.805,0.619,0.786,0.744
GPT-4 (0409),,0.842,0.829,0.485,,0.785,,0.797,0.712,0.77,0.823
GPT-4o,0.7468,0.86,,,0.39,,,,,,
Llama 3 8B Instruct,0.4098,0.674,,,,,,0.951,,,0.805
Llama 3 70B Instruct,0.562,0.805,0.898,0.389,,0.805,,0.902,0.471,0.716,0.726
Mistral Large 2,,0.812,0.892,,,,,0.93,0.962,,0.92
Mixtral 8x22B,0.5633,0.773,,0.364,,0.791,,0.765,0.5,0.689,0.72
o1-mini,0.803,0.852,,,,,,,,,0.924
o1-preview,0.893,0.81,,,,,,,,,
