
Model,MMLU,TriviaQA_wiki_1shot,GPQA_diamond,BBH,GSM8K,Math,HumanEval,MBPP
GPT-4-1106,0.836,0.731,0.404,0.827,0.805,0.619,0.744,0.786
GPT-4-0409,0.842,0.829,0.485,0.785,0.797,0.712,0.823,0.77
Claude-3-Opus,0.846,0.824,0.465,0.785,0.877,0.602,0.762,0.767
Llama-3-70B-Instruct,0.805,0.898,0.389,0.805,0.902,0.471,0.726,0.716
Mixtral-8x22B-Instruct-v0.1,0.772,0.897,0.364,0.791,0.883,0.5,0.72,0.689
