Model,Regression
o1-mini,0.006386291669080422
Kimi K2 (*),0.008755733819178293
GLM-4.5 (*),0.009401377433951018
Gemini 2.0 Flash,0.02014259101793539
gpt-oss-20b (*),0.02101436392558346
Claude 3.7 Sonnet,0.024420369262990897
DeepSeek-R1 (*),0.029092431428115434
o3-mini (low),0.029369433445181732
o4-mini (low),0.033898295186622775
Qwen3 (*),0.03781949009387821
gpt-oss-120b (*),0.03837166857548772
o1 (medium),0.04736758264056298
o1 (low),0.05122960767011326
o3-mini (medium),0.05301720868707607
o1 (high),0.05956463340994888
Gemini 2.5,0.06601769831131873
o3-mini (high),0.08134530446226322
o3 (low),0.08395656885631946
o4-mini (medium),0.09904039320707186
o3-pro,0.10377988771822208
o3 (medium),0.10997506164480868
o4-mini (high),0.11882324432184375
o3 (high),0.13215846743827347
Code (human-written),0.34329700160786886
