Rank,Judge,Z-Score,Correlation (r),Cohen's Kappa (κ),|z|,Human-Like?
1,gpt-4.5,0.9,0.874,0.806,0.9,✅ Yes
2,gpt-4.1,0.41,0.862,0.792,0.41,✅ Yes
3,gemini/gemini-2.5-flash-lite,-0.17,0.857,0.777,0.17,✅ Yes
4,claude-sonnet-4,-0.44,0.847,0.768,0.44,✅ Yes
5,gemini/gemini-2.0-flash,-0.44,0.843,0.769,0.44,✅ Yes
6,gpt-4o,-1.55,0.818,0.728,1.55,❌ No
7,gemini/gemini-2.0-flash-lite,-1.72,0.813,0.727,1.72,❌ No
8,gpt-4,-1.73,0.811,0.723,1.73,❌ No
9,gpt-5-chat,-1.85,0.809,0.72,1.85,❌ No
10,gpt-4o-mini,-2.2,0.804,0.709,2.2,❌ No
11,gpt-4.1-mini,-2.52,0.79,0.702,2.52,❌ No
