Model,overall,instruct,plan,reason,retrieve,understand,review,opensourced,link
GPT-3.5-Turbo (Nov 2023),84.0, 96.6, 86.6, 67.8, 92.2, 85.5, 75.6,FALSE,https://openai.com/blog/new-models-and-developer-products-announced-at-devday
GPT-4-Turbo (Nov 2023),86.4, 96.3, 87.8, 65.3, 88.9, 85.8, 94.5,FALSE,https://openai.com/blog/new-models-and-developer-products-announced-at-devday
Claude-2.1 (Nov 2023),78.8, 97.8, 86.0, 62.8, 77.3, 78.5, 70.4,FALSE,https://www.anthropic.com/index/claude-2-1
LLaMA2-7B, 27.4, 34.5, 28.1, 22.1, 16.9, 24.4, 38.6,TRUE,https://ai.meta.com/llama/
Vicuna-7B,44.8, 48.0, 30.6, 48.8, 22.5, 60.5, 58.5,TRUE,https://github.com/lm-sys/FastChat
InternLM-7B,45.8, 39.1, 55.4, 36.9, 47.1, 50.3, 46.2,TRUE,https://github.com/InternLM/InternLM
CodeLLaMA-7B,28.6, 48.5, 52.8, 14.8, 2.4, 13.4, 40.0,TRUE,https://github.com/facebookresearch/codellama
AgentLM-7B,41.4, 46.9, 34.5, 33.8, 42.0, 46.2, 44.8,TRUE,https://github.com/THUDM/AgentTuning
Baichuan2-7B,56.5, 73.0, 52.3, 41.3, 51.1, 59.6, 61.4,TRUE,https://github.com/baichuan-inc/Baichuan2
ChatGLM3-6B,51.4, 72.0, 42.7, 36.2, 45.2, 57.8, 54.8,TRUE,https://github.com/THUDM/ChatGLM3
Qwen-7B,59.5, 61.5, 64.7, 45.2, 62.1, 61.9, 61.6,TRUE,https://github.com/QwenLM/Qwen
LLaMA2-13B,37.3, 33.4, 56.9, 26.4, 24.7, 29.4, 53.0,TRUE,https://ai.meta.com/llama/
Vicuna-13B,48.1, 48.9, 39.9, 52.7, 20.4, 65.9, 60.8,TRUE,https://github.com/lm-sys/FastChat
WizardLM-13B,49.0, 39.8, 59.2, 30.6, 45.4, 47.8, 71.5,TRUE,https://github.com/nlpxucan/WizardLM
Baichuan2-13B,50.3, 29.9, 60.8, 41.9, 55.7, 56.0, 57.3,TRUE,https://github.com/baichuan-inc/Baichuan2
Qwen-14B,66.3, 73.7, 74.7, 52.4, 75.6, 64.7, 56.9,TRUE,https://github.com/QwenLM/Qwen
LLaMA2-70B,53.0, 79.0, 60.5, 31.1, 39.5, 44.8, 62.8,TRUE,https://ai.meta.com/llama/
WizardLM-70B,44.2, 20.6, 62.1, 42.7, 47.2, 63.6, 28.7,TRUE,https://github.com/nlpxucan/WizardLM
Qwen-72B,71.4, 63.0, 79.2, 59.5, 70.9, 75.3, 80.3,TRUE,https://github.com/QwenLM/Qwen
Mistral-7B,56.0, 61.7, 71.1, 39.1, 51.8, 49.0, 63.2,TRUE,https://mistral.ai/news/announcing-mistral-7b/
Nanbeige-Agent-32B, 76.2, 88.5, 80.8, 61.5, 80.5, 79.7, 66.5, TRUE, https://huggingface.co/Nanbeige