model_name,thinking_mode,model_family,instruction_tuned,source,ChartQA,DocVQA,DocVQA-test,GPQA-Diamond,LiveCodeBench,MATH,MBPP,MGSM,MMLU,MMLU-Pro,MMMU,MMMU-Pro,MTOB-full,MTOB-half,MathVista,TyDiQA
Meta/LLaMA-3.1-70B,Non-thinking,LLaMA,No,LLaMA-4 Report (Pre-trained models),,,,,,41.6,66.4,,79.3,53.8,,,,,,29.9
Meta/LLaMA-3.1-405B,Non-thinking,LLaMA,No,LLaMA-4 Report (Pre-trained models),,,,,,53.5,74.4,,85.2,61.6,,,,,,34.3
Meta/LLaMA-4-Scout,Non-thinking,LLaMA,No,LLaMA-4 Report (Pre-trained models),83.4,89.4,,,,50.3,67.8,,79.6,58.2,,,,,,31.5
Meta/LLaMA-4-Maverick,Non-thinking,LLaMA,No,LLaMA-4 Report (Pre-trained models),85.3,91.6,,,,61.2,77.6,,85.5,62.9,,,,,,31.7
Meta/LLaMA-3.3-70B,Non-thinking,LLaMA,Yes,LLaMA-4 Report (Instruction tuned models),,,,50.5,33.3,,,91.1,,68.9,,,,,,
Meta/LLaMA-3.1-405B,Non-thinking,LLaMA,Yes,LLaMA-4 Report (Instruction tuned models),,,,49.0,27.7,,,91.6,,73.4,,,Context-128K,Context-128K,,
Meta/LLaMA-4-Scout,Non-thinking,LLaMA,Yes,LLaMA-4 Report (Instruction tuned models),88.8,,94.4,57.2,32.8,,,90.6,,74.3,69.4,52.2,39.7/36.3,42.2/36.6,70.7,
Meta/LLaMA-4-Maverick,Non-thinking,LLaMA,Yes,LLaMA-4 Report (Instruction tuned models),90.0,,94.4,69.8,43.4,,,92.3,,80.5,73.4,59.6,50.8/46.7,54.0/46.4,73.7,
