model_name,thinking_mode,model_family,instruction_tuned,source,ARC-Challenge,AlpacaEval-2,ArenaHard,AttaQ,Avg,BBH,BigBenchHard,DROP,GPQA,GSM8K,GSM8K+Py,HellaSwag,HumanEval,HumanEval+,IFEval,MATH,MATH+Py,MATH-Lv5,MMLU,MMLU-Pro,MUSR,OCW,PopQA,SAT,TruthfulQA,Winogrande
IBM/Granite-3.3-8B-Instruct,Non-thinking,Granite,Yes,Granite Report (Instruction Following & General Benchmarks),,61.16,55.23,85.99,,,65.6,50.73,,83.09,,,89.47,86.88,73.57,,,,66.93,,,,28.08,,66.37,
IBM/Granite-8B-Code-Base,Non-thinking,Granite,No,Granite Report Table 15 (Chain-of-thought math tasks),,,,,,,,,,61.9,63.1,,,,,21.4,35.4,,,,,8.8,,62.5,,
IBM/Granite-3.1-8B-Instruct,Non-thinking,Granite,Yes,Granite Report HuggingFace Leaderboard V1,62.62,,,,71.31,,,,,73.84,,84.48,,,,,,,65.34,,,,,,66.23,75.37
IBM/Granite-3.1-2B-Instruct,Non-thinking,Granite,Yes,Granite Report HuggingFace Leaderboard V1,54.61,,,,60.79,,,,,52.76,,75.14,,,,,,,55.31,,,,,,59.42,67.48
IBM/Granite-3.1-3B-A800M-Instruct,Non-thinking,Granite,Yes,Granite Report HuggingFace Leaderboard V1,50.42,,,,56.53,,,,,48.97,,73.01,,,,,,,52.19,,,,,,49.71,64.87
IBM/Granite-3.1-1B-A400M-Instruct,Non-thinking,Granite,Yes,Granite Report HuggingFace Leaderboard V1,42.66,,,,46.29,,,,,33.88,,65.97,,,,,,,26.13,,,,,,46.77,62.35
IBM/Granite-3.1-8B-Instruct,Non-thinking,Granite,Yes,Granite Report HuggingFace Leaderboard V2,,,,,30.55,34.09,,,8.28,,,,,,72.08,,,21.68,,28.19,19.01,,,,,
IBM/Granite-3.1-2B-Instruct,Non-thinking,Granite,Yes,Granite Report HuggingFace Leaderboard V2,,,,,21.06,21.82,,,5.26,,,,,,62.86,,,11.33,,20.21,4.87,,,,,
IBM/Granite-3.1-3B-A800M-Instruct,Non-thinking,Granite,Yes,Granite Report HuggingFace Leaderboard V2,,,,,17.1,16.69,,,5.15,,,,,,55.16,,,10.35,,12.75,2.51,,,,,
IBM/Granite-3.1-1B-A400M-Instruct,Non-thinking,Granite,Yes,Granite Report HuggingFace Leaderboard V2,,,,,10.05,6.18,,,0.78,,,,,,46.86,,,4.08,,2.41,0.78,,,,,
