Rank,Judge,Z-Score,Correlation (r),Cohen's Kappa (κ),|z|,Human-Like?
1,mistralai/mixtral-8x22b-instruct-v0.1,1.45,0.879,0.813,1.45,⚙️ Super-Consistent
2,meta-llama/Meta-Llama-3-70B-Instruct,1.43,0.88,0.811,1.43,⚙️ Super-Consistent
3,google/gemma-3-27b-it,1.34,0.879,0.812,1.34,⚙️ Super-Consistent
4,jondurbin/bagel-34b-v0.2,1.01,0.872,0.804,1.01,⚙️ Super-Consistent
5,meta/llama-3.1-70b-instruct,0.61,0.868,0.798,0.61,✅ Yes
6,meta/llama-3.1-405b-instruct,0.31,0.862,0.79,0.31,✅ Yes
7,mistralai/Mistral-Large-Instruct-2407,0.26,0.87,0.789,0.26,✅ Yes
8,meta-llama/Llama-3.3-70B-Instruct,0.18,0.86,0.786,0.18,✅ Yes
9,Qwen/Qwen2.5-72B-Instruct,0.14,0.858,0.785,0.14,✅ Yes
10,Qwen/Qwen3-30B-A3B-Instruct-2507,-0.04,0.846,0.78,0.04,✅ Yes
11,nvidia/llama-3.3-nemotron-super-49b-v1,-0.2,0.852,0.775,0.2,✅ Yes
12,microsoft/Phi-3.5-MoE-instruct,-0.31,0.84,0.775,0.31,✅ Yes
13,nv-mistralai/mistral-nemo-12b-instruct,-0.39,0.842,0.774,0.39,✅ Yes
14,meta/llama-4-scout-17b-16e-instruct,-0.55,0.834,0.768,0.55,✅ Yes
15,openai/gpt-oss-20b,-0.58,0.837,0.765,0.58,✅ Yes
16,jondurbin/bagel-dpo-8x7b-v0.2,-0.63,0.83,0.766,0.63,✅ Yes
17,nvidia/llama-3.1-nemotron-ultra-253b-v1,-0.65,0.833,0.767,0.65,✅ Yes
18,MaziyarPanahi/calme-3.2-instruct-78b,-0.82,0.833,0.757,0.82,✅ Yes
19,openai/gpt-oss-120b,-0.85,0.833,0.756,0.85,✅ Yes
20,nvidia/llama-3.1-nemotron-70b-instruct,-0.87,0.838,0.756,0.87,✅ Yes
21,nvidia/llama-3.3-nemotron-super-49b-v1.5,-0.94,0.826,0.762,0.94,✅ Yes
22,Qwen/Qwen2.5-32B-Instruct,-0.96,0.831,0.753,0.96,✅ Yes
23,Qwen/Qwen3-4B-Instruct-2507,-1.26,0.818,0.749,1.26,❌ No
24,mistralai/Mixtral-8x7B-Instruct-v0.1,-1.31,0.823,0.754,1.31,❌ No
25,meta/llama-4-maverick-17b-128e-instruct,-2.18,0.802,0.712,2.18,❌ No
26,Qwen/Qwen3-14B,-2.38,0.795,0.705,2.38,❌ No
27,mistralai/Devstral-Small-2507,-2.54,0.796,0.726,2.54,❌ No
28,meta/llama-3.1-8b-instruct,-2.73,0.8,0.73,2.73,❌ No
29,CohereLabs/c4ai-command-r7b-12-2024,-2.88,0.793,0.724,2.88,❌ No
30,google/gemma-3-12b-it,-2.94,0.772,0.686,2.94,❌ No
31,microsoft/Phi-mini-MoE-instruct,-3.38,0.773,0.706,3.38,❌ No
32,microsoft/Phi-4-mini-instruct,-3.56,0.778,0.719,3.56,❌ No
33,nvidia/llama-3.1-nemotron-nano-8b-v1,-3.67,0.771,0.712,3.67,❌ No
34,meta-llama/Meta-Llama-3-8B-Instruct,-4.1,0.778,0.699,4.1,❌ No
35,google/gemma-2-2b-it,-4.62,0.74,0.681,4.62,❌ No
36,mistralai/Ministral-8B-Instruct-2410,-5.01,0.762,0.663,5.01,❌ No
37,deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct,-6.63,0.729,0.634,6.63,❌ No
38,meta/llama-3.2-3b-instruct,-7.42,0.731,0.626,7.42,❌ No
39,Qwen/Qwen2.5-7B-Instruct,-7.74,0.67,0.532,7.74,❌ No
40,nvidia/nemotron-mini-4b-instruct,-7.76,0.671,0.615,7.76,❌ No
41,ai21labs/AI21-Jamba-Mini-1.7,-11.34,0.645,0.5,11.34,❌ No
42,google/gemma-3-1b-it,-13.49,0.614,0.56,13.49,❌ No
43,meta/llama-3.2-1b-instruct,-54.74,0.02,0.005,54.74,❌ No
