total_samples,cot_correct_answers,es_correct_answers,cot_average_tokens,es_average_tokens,es_step_average_tokens,model,dataset,min_slope,threshold
30,19,10,134762.06666666668,63580.4,44633.03333333333,DeepSeek-R1-Distill-Llama-8B,aime,3,0.01
30,19,10,134762.06666666668,63488.53333333333,44595.03333333333,DeepSeek-R1-Distill-Llama-8B,aime,3,0.05
30,19,10,134762.06666666668,63189.7,44478.433333333334,DeepSeek-R1-Distill-Llama-8B,aime,3,0.1
30,19,10,134762.06666666668,63188.166666666664,44477.76666666667,DeepSeek-R1-Distill-Llama-8B,aime,3,0.15
30,19,10,134762.06666666668,63179.86666666667,44477.1,DeepSeek-R1-Distill-Llama-8B,aime,3,0.2
30,19,16,134762.06666666668,82788.03333333334,58232.6,DeepSeek-R1-Distill-Llama-8B,aime,5,0.01
30,19,16,134762.06666666668,82711.56666666667,58200.6,DeepSeek-R1-Distill-Llama-8B,aime,5,0.05
30,19,16,134762.06666666668,82572.86666666667,58115.26666666667,DeepSeek-R1-Distill-Llama-8B,aime,5,0.1
30,19,16,134762.06666666668,82571.33333333333,58114.6,DeepSeek-R1-Distill-Llama-8B,aime,5,0.15
30,19,16,134762.06666666668,82563.03333333334,58113.933333333334,DeepSeek-R1-Distill-Llama-8B,aime,5,0.2
30,19,18,134762.06666666668,91294.5,64596.566666666666,DeepSeek-R1-Distill-Llama-8B,aime,7,0.01
30,19,18,134762.06666666668,91294.5,64596.566666666666,DeepSeek-R1-Distill-Llama-8B,aime,7,0.05
30,19,18,134762.06666666668,91237.9,64566.566666666666,DeepSeek-R1-Distill-Llama-8B,aime,7,0.1
30,19,18,134762.06666666668,91237.9,64566.566666666666,DeepSeek-R1-Distill-Llama-8B,aime,7,0.15
30,19,18,134762.06666666668,91237.9,64566.566666666666,DeepSeek-R1-Distill-Llama-8B,aime,7,0.2
30,19,18,134762.06666666668,98212.93333333333,69698.93333333333,DeepSeek-R1-Distill-Llama-8B,aime,10,0.01
30,19,18,134762.06666666668,98212.93333333333,69698.93333333333,DeepSeek-R1-Distill-Llama-8B,aime,10,0.05
30,19,18,134762.06666666668,98212.93333333333,69698.93333333333,DeepSeek-R1-Distill-Llama-8B,aime,10,0.1
30,19,18,134762.06666666668,98212.93333333333,69698.93333333333,DeepSeek-R1-Distill-Llama-8B,aime,10,0.15
30,19,18,134762.06666666668,98212.93333333333,69698.93333333333,DeepSeek-R1-Distill-Llama-8B,aime,10,0.2
30,19,18,134762.06666666668,106285.1,75904.86666666667,DeepSeek-R1-Distill-Llama-8B,aime,15,0.01
30,19,18,134762.06666666668,106285.1,75904.86666666667,DeepSeek-R1-Distill-Llama-8B,aime,15,0.05
30,19,18,134762.06666666668,106285.1,75904.86666666667,DeepSeek-R1-Distill-Llama-8B,aime,15,0.1
30,19,18,134762.06666666668,106285.1,75904.86666666667,DeepSeek-R1-Distill-Llama-8B,aime,15,0.15
30,19,18,134762.06666666668,106285.1,75904.86666666667,DeepSeek-R1-Distill-Llama-8B,aime,15,0.2
30,19,18,134762.06666666668,110793.46666666666,79356.4,DeepSeek-R1-Distill-Llama-8B,aime,20,0.01
30,19,18,134762.06666666668,110793.46666666666,79356.4,DeepSeek-R1-Distill-Llama-8B,aime,20,0.05
30,19,18,134762.06666666668,110793.46666666666,79356.4,DeepSeek-R1-Distill-Llama-8B,aime,20,0.1
30,19,18,134762.06666666668,110793.46666666666,79356.4,DeepSeek-R1-Distill-Llama-8B,aime,20,0.15
30,19,18,134762.06666666668,110793.46666666666,79356.4,DeepSeek-R1-Distill-Llama-8B,aime,20,0.2
40,0,0,66549.975,23655.375,14650.625,DeepSeek-R1-Distill-Llama-8B,amc,3,0.01
40,0,0,66549.975,23390.025,14536.5,DeepSeek-R1-Distill-Llama-8B,amc,3,0.05
40,0,0,66549.975,23226.0,14431.75,DeepSeek-R1-Distill-Llama-8B,amc,3,0.1
40,0,0,66549.975,23140.1,14379.25,DeepSeek-R1-Distill-Llama-8B,amc,3,0.15
40,0,0,66549.975,23100.75,14361.25,DeepSeek-R1-Distill-Llama-8B,amc,3,0.2
40,0,0,66549.975,30467.5,20076.475,DeepSeek-R1-Distill-Llama-8B,amc,5,0.01
40,0,0,66549.975,30313.025,20012.9,DeepSeek-R1-Distill-Llama-8B,amc,5,0.05
40,0,0,66549.975,30192.125,19934.675,DeepSeek-R1-Distill-Llama-8B,amc,5,0.1
40,0,0,66549.975,30164.075,19918.175,DeepSeek-R1-Distill-Llama-8B,amc,5,0.15
40,0,0,66549.975,30129.375,19903.675,DeepSeek-R1-Distill-Llama-8B,amc,5,0.2
40,0,0,66549.975,34675.25,23024.15,DeepSeek-R1-Distill-Llama-8B,amc,7,0.01
40,0,0,66549.975,34563.85,22976.975,DeepSeek-R1-Distill-Llama-8B,amc,7,0.05
40,0,0,66549.975,34446.95,22900.825,DeepSeek-R1-Distill-Llama-8B,amc,7,0.1
40,0,0,66549.975,34420.225,22885.325,DeepSeek-R1-Distill-Llama-8B,amc,7,0.15
40,0,0,66549.975,34384.95,22870.825,DeepSeek-R1-Distill-Llama-8B,amc,7,0.2
40,0,0,66549.975,39211.225,26398.3,DeepSeek-R1-Distill-Llama-8B,amc,10,0.01
40,0,0,66549.975,39150.9,26374.15,DeepSeek-R1-Distill-Llama-8B,amc,10,0.05
40,0,0,66549.975,39037.75,26299.5,DeepSeek-R1-Distill-Llama-8B,amc,10,0.1
40,0,0,66549.975,39013.5,26285.5,DeepSeek-R1-Distill-Llama-8B,amc,10,0.15
40,0,0,66549.975,38987.2,26274.5,DeepSeek-R1-Distill-Llama-8B,amc,10,0.2
40,0,0,66549.975,42966.55,29094.875,DeepSeek-R1-Distill-Llama-8B,amc,15,0.01
40,0,0,66549.975,42925.15,29081.375,DeepSeek-R1-Distill-Llama-8B,amc,15,0.05
40,0,0,66549.975,42815.95,29011.725,DeepSeek-R1-Distill-Llama-8B,amc,15,0.1
40,0,0,66549.975,42791.7,28997.725,DeepSeek-R1-Distill-Llama-8B,amc,15,0.15
40,0,0,66549.975,42785.575,28996.225,DeepSeek-R1-Distill-Llama-8B,amc,15,0.2
40,0,0,66549.975,46744.85,32347.85,DeepSeek-R1-Distill-Llama-8B,amc,20,0.01
40,0,0,66549.975,46737.975,32344.85,DeepSeek-R1-Distill-Llama-8B,amc,20,0.05
40,0,0,66549.975,46649.025,32297.275,DeepSeek-R1-Distill-Llama-8B,amc,20,0.1
40,0,0,66549.975,46638.525,32293.275,DeepSeek-R1-Distill-Llama-8B,amc,20,0.15
40,0,0,66549.975,46638.525,32293.275,DeepSeek-R1-Distill-Llama-8B,amc,20,0.2
198,102,75,70824.06565656565,16921.439393939392,5360.989898989899,DeepSeek-R1-Distill-Llama-8B,gpqa,3,0.01
198,102,69,70824.06565656565,15858.176767676769,4984.383838383838,DeepSeek-R1-Distill-Llama-8B,gpqa,3,0.05
198,102,68,70824.06565656565,15420.348484848484,4825.30303030303,DeepSeek-R1-Distill-Llama-8B,gpqa,3,0.1
198,102,67,70824.06565656565,15181.131313131313,4748.489898989899,DeepSeek-R1-Distill-Llama-8B,gpqa,3,0.15
198,102,67,70824.06565656565,15142.227272727272,4735.79797979798,DeepSeek-R1-Distill-Llama-8B,gpqa,3,0.2
198,102,86,70824.06565656565,29324.444444444445,11464.040404040405,DeepSeek-R1-Distill-Llama-8B,gpqa,5,0.01
198,102,84,70824.06565656565,28611.0,11217.929292929293,DeepSeek-R1-Distill-Llama-8B,gpqa,5,0.05
198,102,85,70824.06565656565,28450.883838383837,11150.040404040405,DeepSeek-R1-Distill-Llama-8B,gpqa,5,0.1
198,102,85,70824.06565656565,28329.070707070707,11091.878787878788,DeepSeek-R1-Distill-Llama-8B,gpqa,5,0.15
198,102,85,70824.06565656565,28312.20202020202,11084.70707070707,DeepSeek-R1-Distill-Llama-8B,gpqa,5,0.2
198,102,95,70824.06565656565,39718.64646464647,17087.878787878788,DeepSeek-R1-Distill-Llama-8B,gpqa,7,0.01
198,102,94,70824.06565656565,39143.166666666664,16873.176767676767,DeepSeek-R1-Distill-Llama-8B,gpqa,7,0.05
198,102,94,70824.06565656565,39032.58585858586,16825.116161616163,DeepSeek-R1-Distill-Llama-8B,gpqa,7,0.1
198,102,94,70824.06565656565,38991.35353535353,16806.439393939392,DeepSeek-R1-Distill-Llama-8B,gpqa,7,0.15
198,102,94,70824.06565656565,38977.88888888889,16798.090909090908,DeepSeek-R1-Distill-Llama-8B,gpqa,7,0.2
198,102,100,70824.06565656565,49598.20202020202,22698.136363636364,DeepSeek-R1-Distill-Llama-8B,gpqa,10,0.01
198,102,100,70824.06565656565,49379.11111111111,22614.29292929293,DeepSeek-R1-Distill-Llama-8B,gpqa,10,0.05
198,102,100,70824.06565656565,49328.19191919192,22588.70707070707,DeepSeek-R1-Distill-Llama-8B,gpqa,10,0.1
198,102,100,70824.06565656565,49305.85858585859,22579.141414141413,DeepSeek-R1-Distill-Llama-8B,gpqa,10,0.15
198,102,100,70824.06565656565,49300.92929292929,22576.540404040403,DeepSeek-R1-Distill-Llama-8B,gpqa,10,0.2
198,102,101,70824.06565656565,57191.89898989899,26835.939393939392,DeepSeek-R1-Distill-Llama-8B,gpqa,15,0.01
198,102,101,70824.06565656565,57100.08585858586,26801.065656565657,DeepSeek-R1-Distill-Llama-8B,gpqa,15,0.05
198,102,101,70824.06565656565,57076.93434343435,26788.045454545456,DeepSeek-R1-Distill-Llama-8B,gpqa,15,0.1
198,102,101,70824.06565656565,57071.40909090909,26785.47474747475,DeepSeek-R1-Distill-Llama-8B,gpqa,15,0.15
198,102,101,70824.06565656565,57059.4494949495,26779.348484848484,DeepSeek-R1-Distill-Llama-8B,gpqa,15,0.2
198,102,101,70824.06565656565,61307.292929292926,29095.646464646463,DeepSeek-R1-Distill-Llama-8B,gpqa,20,0.01
198,102,101,70824.06565656565,61248.42929292929,29070.136363636364,DeepSeek-R1-Distill-Llama-8B,gpqa,20,0.05
198,102,101,70824.06565656565,61225.954545454544,29060.161616161615,DeepSeek-R1-Distill-Llama-8B,gpqa,20,0.1
198,102,101,70824.06565656565,61221.156565656565,29058.09595959596,DeepSeek-R1-Distill-Llama-8B,gpqa,20,0.15
198,102,101,70824.06565656565,61219.36363636364,29056.883838383837,DeepSeek-R1-Distill-Llama-8B,gpqa,20,0.2
500,332,286,33219.032,12446.762,6024.826,DeepSeek-R1-Distill-Llama-8B,math,3,0.01
500,332,283,33219.032,12027.218,5874.528,DeepSeek-R1-Distill-Llama-8B,math,3,0.05
500,332,282,33219.032,11838.554,5809.958,DeepSeek-R1-Distill-Llama-8B,math,3,0.1
500,332,280,33219.032,11798.832,5794.9,DeepSeek-R1-Distill-Llama-8B,math,3,0.15
500,332,280,33219.032,11786.832,5789.824,DeepSeek-R1-Distill-Llama-8B,math,3,0.2
500,332,307,33219.032,16442.976,8516.87,DeepSeek-R1-Distill-Llama-8B,math,5,0.01
500,332,305,33219.032,16142.218,8399.224,DeepSeek-R1-Distill-Llama-8B,math,5,0.05
500,332,305,33219.032,16071.012,8368.576,DeepSeek-R1-Distill-Llama-8B,math,5,0.1
500,332,305,33219.032,16052.928,8361.918,DeepSeek-R1-Distill-Llama-8B,math,5,0.15
500,332,305,33219.032,16042.364,8355.68,DeepSeek-R1-Distill-Llama-8B,math,5,0.2
500,332,316,33219.032,18992.444,10075.422,DeepSeek-R1-Distill-Llama-8B,math,7,0.01
500,332,314,33219.032,18771.662,9982.074,DeepSeek-R1-Distill-Llama-8B,math,7,0.05
500,332,314,33219.032,18704.924,9943.724,DeepSeek-R1-Distill-Llama-8B,math,7,0.1
500,332,314,33219.032,18695.16,9939.676,DeepSeek-R1-Distill-Llama-8B,math,7,0.15
500,332,314,33219.032,18692.566,9937.578,DeepSeek-R1-Distill-Llama-8B,math,7,0.2
500,332,322,33219.032,21794.824,11792.428,DeepSeek-R1-Distill-Llama-8B,math,10,0.01
500,332,321,33219.032,21640.016,11724.762,DeepSeek-R1-Distill-Llama-8B,math,10,0.05
500,332,321,33219.032,21620.72,11714.066,DeepSeek-R1-Distill-Llama-8B,math,10,0.1
500,332,321,33219.032,21617.78,11712.464,DeepSeek-R1-Distill-Llama-8B,math,10,0.15
500,332,321,33219.032,21615.478,11710.76,DeepSeek-R1-Distill-Llama-8B,math,10,0.2
500,332,327,33219.032,24434.986,13458.712,DeepSeek-R1-Distill-Llama-8B,math,15,0.01
500,332,326,33219.032,24317.952,13397.066,DeepSeek-R1-Distill-Llama-8B,math,15,0.05
500,332,326,33219.032,24300.346,13387.996,DeepSeek-R1-Distill-Llama-8B,math,15,0.1
500,332,326,33219.032,24297.888,13386.572,DeepSeek-R1-Distill-Llama-8B,math,15,0.15
500,332,326,33219.032,24296.39,13385.252,DeepSeek-R1-Distill-Llama-8B,math,15,0.2
500,332,330,33219.032,26223.988,14572.112,DeepSeek-R1-Distill-Llama-8B,math,20,0.01
500,332,329,33219.032,26169.388,14549.796,DeepSeek-R1-Distill-Llama-8B,math,20,0.05
500,332,328,33219.032,26157.258,14538.212,DeepSeek-R1-Distill-Llama-8B,math,20,0.1
500,332,328,33219.032,26155.804,14537.046,DeepSeek-R1-Distill-Llama-8B,math,20,0.15
500,332,328,33219.032,26154.37,14535.806,DeepSeek-R1-Distill-Llama-8B,math,20,0.2
272,52,55,42329.65073529412,25990.75,13033.900735294117,DeepSeek-R1-Distill-Llama-8B,minerva,3,0.01
272,52,55,42329.65073529412,25821.319852941175,12975.558823529413,DeepSeek-R1-Distill-Llama-8B,minerva,3,0.05
272,52,55,42329.65073529412,25700.75,12925.136029411764,DeepSeek-R1-Distill-Llama-8B,minerva,3,0.1
272,52,55,42329.65073529412,25684.952205882353,12919.625,DeepSeek-R1-Distill-Llama-8B,minerva,3,0.15
272,52,55,42329.65073529412,25674.007352941175,12912.066176470587,DeepSeek-R1-Distill-Llama-8B,minerva,3,0.2
272,52,56,42329.65073529412,31917.683823529413,16548.227941176472,DeepSeek-R1-Distill-Llama-8B,minerva,5,0.01
272,52,56,42329.65073529412,31827.60661764706,16510.867647058825,DeepSeek-R1-Distill-Llama-8B,minerva,5,0.05
272,52,56,42329.65073529412,31813.448529411766,16504.97426470588,DeepSeek-R1-Distill-Llama-8B,minerva,5,0.1
272,52,56,42329.65073529412,31805.15073529412,16501.577205882353,DeepSeek-R1-Distill-Llama-8B,minerva,5,0.15
272,52,56,42329.65073529412,31759.1875,16465.70588235294,DeepSeek-R1-Distill-Llama-8B,minerva,5,0.2
272,52,53,42329.65073529412,35068.080882352944,18490.966911764706,DeepSeek-R1-Distill-Llama-8B,minerva,7,0.01
272,52,53,42329.65073529412,34996.46323529412,18458.797794117647,DeepSeek-R1-Distill-Llama-8B,minerva,7,0.05
272,52,53,42329.65073529412,34952.356617647056,18432.110294117647,DeepSeek-R1-Distill-Llama-8B,minerva,7,0.1
272,52,53,42329.65073529412,34948.25,18430.077205882353,DeepSeek-R1-Distill-Llama-8B,minerva,7,0.15
272,52,53,42329.65073529412,34946.555147058825,18429.694852941175,DeepSeek-R1-Distill-Llama-8B,minerva,7,0.2
272,52,57,42329.65073529412,37743.58823529412,20152.683823529413,DeepSeek-R1-Distill-Llama-8B,minerva,10,0.01
272,52,57,42329.65073529412,37671.794117647056,20119.54411764706,DeepSeek-R1-Distill-Llama-8B,minerva,10,0.05
272,52,57,42329.65073529412,37664.74632352941,20116.183823529413,DeepSeek-R1-Distill-Llama-8B,minerva,10,0.1
272,52,57,42329.65073529412,37661.029411764706,20114.566176470587,DeepSeek-R1-Distill-Llama-8B,minerva,10,0.15
272,52,57,42329.65073529412,37660.768382352944,20114.345588235294,DeepSeek-R1-Distill-Llama-8B,minerva,10,0.2
272,52,52,42329.65073529412,39664.794117647056,21385.128676470587,DeepSeek-R1-Distill-Llama-8B,minerva,15,0.01
272,52,52,42329.65073529412,39638.867647058825,21372.639705882353,DeepSeek-R1-Distill-Llama-8B,minerva,15,0.05
272,52,52,42329.65073529412,39636.75,21371.077205882353,DeepSeek-R1-Distill-Llama-8B,minerva,15,0.1
272,52,52,42329.65073529412,39633.15073529412,21369.533088235294,DeepSeek-R1-Distill-Llama-8B,minerva,15,0.15
272,52,52,42329.65073529412,39633.15073529412,21369.533088235294,DeepSeek-R1-Distill-Llama-8B,minerva,15,0.2
272,52,52,42329.65073529412,40543.33823529412,21942.110294117647,DeepSeek-R1-Distill-Llama-8B,minerva,20,0.01
272,52,52,42329.65073529412,40532.51470588235,21938.970588235294,DeepSeek-R1-Distill-Llama-8B,minerva,20,0.05
272,52,52,42329.65073529412,40532.408088235294,21938.897058823528,DeepSeek-R1-Distill-Llama-8B,minerva,20,0.1
272,52,52,42329.65073529412,40532.408088235294,21938.897058823528,DeepSeek-R1-Distill-Llama-8B,minerva,20,0.15
272,52,52,42329.65073529412,40532.408088235294,21938.897058823528,DeepSeek-R1-Distill-Llama-8B,minerva,20,0.2
675,262,197,78464.72888888889,24354.014814814815,12961.477037037037,DeepSeek-R1-Distill-Llama-8B,olympiadbench,3,0.01
675,262,195,78464.72888888889,23872.792592592592,12770.580740740741,DeepSeek-R1-Distill-Llama-8B,olympiadbench,3,0.05
675,262,192,78464.72888888889,23659.309629629628,12666.961481481481,DeepSeek-R1-Distill-Llama-8B,olympiadbench,3,0.1
675,262,192,78464.72888888889,23604.425185185184,12643.435555555556,DeepSeek-R1-Distill-Llama-8B,olympiadbench,3,0.15
675,262,191,78464.72888888889,23577.87851851852,12631.678518518518,DeepSeek-R1-Distill-Llama-8B,olympiadbench,3,0.2
675,262,235,78464.72888888889,35026.09777777778,20292.515555555554,DeepSeek-R1-Distill-Llama-8B,olympiadbench,5,0.01
675,262,233,78464.72888888889,34652.35407407407,20134.62814814815,DeepSeek-R1-Distill-Llama-8B,olympiadbench,5,0.05
675,262,232,78464.72888888889,34539.665185185186,20073.51851851852,DeepSeek-R1-Distill-Llama-8B,olympiadbench,5,0.1
675,262,232,78464.72888888889,34483.34074074074,20041.557037037037,DeepSeek-R1-Distill-Llama-8B,olympiadbench,5,0.15
675,262,232,78464.72888888889,34468.26962962963,20033.226666666666,DeepSeek-R1-Distill-Llama-8B,olympiadbench,5,0.2
675,262,248,78464.72888888889,41922.26962962963,25170.998518518518,DeepSeek-R1-Distill-Llama-8B,olympiadbench,7,0.01
675,262,246,78464.72888888889,41605.80888888889,25039.862222222222,DeepSeek-R1-Distill-Llama-8B,olympiadbench,7,0.05
675,262,246,78464.72888888889,41538.10666666667,25000.706666666665,DeepSeek-R1-Distill-Llama-8B,olympiadbench,7,0.1
675,262,246,78464.72888888889,41501.28592592593,24978.717037037037,DeepSeek-R1-Distill-Llama-8B,olympiadbench,7,0.15
675,262,246,78464.72888888889,41491.939259259256,24973.645925925925,DeepSeek-R1-Distill-Llama-8B,olympiadbench,7,0.2
675,262,250,78464.72888888889,48580.39111111111,29858.48,DeepSeek-R1-Distill-Llama-8B,olympiadbench,10,0.01
675,262,249,78464.72888888889,48351.52,29742.18074074074,DeepSeek-R1-Distill-Llama-8B,olympiadbench,10,0.05
675,262,248,78464.72888888889,48316.65925925926,29724.05925925926,DeepSeek-R1-Distill-Llama-8B,olympiadbench,10,0.1
675,262,248,78464.72888888889,48294.921481481484,29712.26814814815,DeepSeek-R1-Distill-Llama-8B,olympiadbench,10,0.15
675,262,248,78464.72888888889,48289.31555555556,29709.122962962963,DeepSeek-R1-Distill-Llama-8B,olympiadbench,10,0.2
675,262,259,78464.72888888889,55604.15851851852,34738.70814814815,DeepSeek-R1-Distill-Llama-8B,olympiadbench,15,0.01
675,262,259,78464.72888888889,55501.734814814816,34687.88296296296,DeepSeek-R1-Distill-Llama-8B,olympiadbench,15,0.05
675,262,259,78464.72888888889,55455.84888888889,34661.583703703705,DeepSeek-R1-Distill-Llama-8B,olympiadbench,15,0.1
675,262,259,78464.72888888889,55442.03555555556,34653.76296296297,DeepSeek-R1-Distill-Llama-8B,olympiadbench,15,0.15
675,262,259,78464.72888888889,55437.33185185185,34651.15555555555,DeepSeek-R1-Distill-Llama-8B,olympiadbench,15,0.2
675,262,261,78464.72888888889,59537.497777777775,37420.45333333333,DeepSeek-R1-Distill-Llama-8B,olympiadbench,20,0.01
675,262,261,78464.72888888889,59465.002962962964,37379.48740740741,DeepSeek-R1-Distill-Llama-8B,olympiadbench,20,0.05
675,262,261,78464.72888888889,59447.746666666666,37369.40888888889,DeepSeek-R1-Distill-Llama-8B,olympiadbench,20,0.1
675,262,261,78464.72888888889,59443.58814814815,37366.625185185185,DeepSeek-R1-Distill-Llama-8B,olympiadbench,20,0.15
675,262,261,78464.72888888889,59439.3837037037,37364.52148148148,DeepSeek-R1-Distill-Llama-8B,olympiadbench,20,0.2
30,23,12,132060.46666666667,52793.53333333333,29188.4,QwQ-32B,aime,3,0.01
30,23,12,132060.46666666667,52461.63333333333,29002.733333333334,QwQ-32B,aime,3,0.05
30,23,12,132060.46666666667,52342.13333333333,28941.133333333335,QwQ-32B,aime,3,0.1
30,23,12,132060.46666666667,52092.76666666667,28844.6,QwQ-32B,aime,3,0.15
30,23,12,132060.46666666667,52085.4,28842.6,QwQ-32B,aime,3,0.2
30,23,17,132060.46666666667,66170.2,37879.6,QwQ-32B,aime,5,0.01
30,23,17,132060.46666666667,65922.06666666667,37739.0,QwQ-32B,aime,5,0.05
30,23,17,132060.46666666667,65903.9,37729.666666666664,QwQ-32B,aime,5,0.1
30,23,17,132060.46666666667,65898.06666666667,37726.333333333336,QwQ-32B,aime,5,0.15
30,23,17,132060.46666666667,65894.46666666666,37725.0,QwQ-32B,aime,5,0.2
30,23,19,132060.46666666667,75703.6,44831.8,QwQ-32B,aime,7,0.01
30,23,19,132060.46666666667,75479.03333333334,44701.86666666667,QwQ-32B,aime,7,0.05
30,23,19,132060.46666666667,75448.93333333333,44682.333333333336,QwQ-32B,aime,7,0.1
30,23,19,132060.46666666667,75444.9,44679.666666666664,QwQ-32B,aime,7,0.15
30,23,19,132060.46666666667,75441.3,44678.333333333336,QwQ-32B,aime,7,0.2
30,23,20,132060.46666666667,83920.23333333334,51074.833333333336,QwQ-32B,aime,10,0.01
30,23,20,132060.46666666667,83719.26666666666,50953.566666666666,QwQ-32B,aime,10,0.05
30,23,20,132060.46666666667,83697.26666666666,50936.5,QwQ-32B,aime,10,0.1
30,23,20,132060.46666666667,83694.16666666667,50934.5,QwQ-32B,aime,10,0.15
30,23,20,132060.46666666667,83691.8,50933.833333333336,QwQ-32B,aime,10,0.2
30,23,22,132060.46666666667,90623.6,55757.933333333334,QwQ-32B,aime,15,0.01
30,23,22,132060.46666666667,90474.26666666666,55665.03333333333,QwQ-32B,aime,15,0.05
30,23,22,132060.46666666667,90469.73333333334,55662.36666666667,QwQ-32B,aime,15,0.1
30,23,22,132060.46666666667,90469.73333333334,55662.36666666667,QwQ-32B,aime,15,0.15
30,23,22,132060.46666666667,90469.73333333334,55662.36666666667,QwQ-32B,aime,15,0.2
30,23,21,132060.46666666667,94684.9,58489.73333333333,QwQ-32B,aime,20,0.01
30,23,21,132060.46666666667,94601.4,58438.4,QwQ-32B,aime,20,0.05
30,23,21,132060.46666666667,94601.4,58438.4,QwQ-32B,aime,20,0.1
30,23,21,132060.46666666667,94601.4,58438.4,QwQ-32B,aime,20,0.15
30,23,21,132060.46666666667,94601.4,58438.4,QwQ-32B,aime,20,0.2
40,0,0,72305.775,18994.25,11293.625,QwQ-32B,amc,3,0.01
40,0,0,72305.775,17923.575,10732.2,QwQ-32B,amc,3,0.05
40,0,0,72305.775,17582.6,10575.2,QwQ-32B,amc,3,0.1
40,0,0,72305.775,17457.3,10514.425,QwQ-32B,amc,3,0.15
40,0,0,72305.775,17430.6,10504.625,QwQ-32B,amc,3,0.2
40,0,0,72305.775,25647.9,17144.725,QwQ-32B,amc,5,0.01
40,0,0,72305.775,24801.375,16696.8,QwQ-32B,amc,5,0.05
40,0,0,72305.775,24621.7,16605.475,QwQ-32B,amc,5,0.1
40,0,0,72305.775,24528.125,16560.775,QwQ-32B,amc,5,0.15
40,0,0,72305.775,24506.45,16554.475,QwQ-32B,amc,5,0.2
40,0,0,72305.775,30424.45,21039.2,QwQ-32B,amc,7,0.01
40,0,0,72305.775,29748.05,20644.3,QwQ-32B,amc,7,0.05
40,0,0,72305.775,29588.15,20556.425,QwQ-32B,amc,7,0.1
40,0,0,72305.775,29492.6,20508.9,QwQ-32B,amc,7,0.15
40,0,0,72305.775,29478.6,20506.1,QwQ-32B,amc,7,0.2
40,0,0,72305.775,35114.525,24872.1,QwQ-32B,amc,10,0.01
40,0,0,72305.775,34576.55,24541.85,QwQ-32B,amc,10,0.05
40,0,0,72305.775,34417.825,24449.15,QwQ-32B,amc,10,0.1
40,0,0,72305.775,34390.15,24414.525,QwQ-32B,amc,10,0.15
40,0,0,72305.775,34382.5,24412.225,QwQ-32B,amc,10,0.2
40,0,0,72305.775,41070.05,29355.9,QwQ-32B,amc,15,0.01
40,0,0,72305.775,40855.85,29247.575,QwQ-32B,amc,15,0.05
40,0,0,72305.775,40775.875,29212.075,QwQ-32B,amc,15,0.1
40,0,0,72305.775,40746.825,29180.45,QwQ-32B,amc,15,0.15
40,0,0,72305.775,40741.8,29180.15,QwQ-32B,amc,15,0.2
40,0,0,72305.775,45079.5,32137.35,QwQ-32B,amc,20,0.01
40,0,0,72305.775,44865.325,32002.7,QwQ-32B,amc,20,0.05
40,0,0,72305.775,44798.275,31967.125,QwQ-32B,amc,20,0.1
40,0,0,72305.775,44771.925,31934.825,QwQ-32B,amc,20,0.15
40,0,0,72305.775,44766.9,31934.525,QwQ-32B,amc,20,0.2
198,136,87,70460.87373737374,16722.161616161615,3954.580808080808,QwQ-32B,gpqa,3,0.01
198,136,86,70460.87373737374,15219.838383838383,3505.020202020202,QwQ-32B,gpqa,3,0.05
198,136,85,70460.87373737374,14730.79797979798,3370.252525252525,QwQ-32B,gpqa,3,0.1
198,136,85,70460.87373737374,14483.29797979798,3308.5555555555557,QwQ-32B,gpqa,3,0.15
198,136,85,70460.87373737374,14391.09090909091,3279.6060606060605,QwQ-32B,gpqa,3,0.2
198,136,98,70460.87373737374,23629.454545454544,6376.843434343435,QwQ-32B,gpqa,5,0.01
198,136,96,70460.87373737374,22518.747474747473,5998.030303030303,QwQ-32B,gpqa,5,0.05
198,136,96,70460.87373737374,22282.909090909092,5916.611111111111,QwQ-32B,gpqa,5,0.1
198,136,95,70460.87373737374,22141.030303030304,5879.767676767677,QwQ-32B,gpqa,5,0.15
198,136,94,70460.87373737374,22100.156565656565,5861.666666666667,QwQ-32B,gpqa,5,0.2
198,136,110,70460.87373737374,30419.90404040404,9245.767676767677,QwQ-32B,gpqa,7,0.01
198,136,107,70460.87373737374,29438.47474747475,8903.88383838384,QwQ-32B,gpqa,7,0.05
198,136,107,70460.87373737374,29243.671717171717,8840.257575757576,QwQ-32B,gpqa,7,0.1
198,136,107,70460.87373737374,29160.89898989899,8813.111111111111,QwQ-32B,gpqa,7,0.15
198,136,106,70460.87373737374,29108.515151515152,8790.843434343435,QwQ-32B,gpqa,7,0.2
198,136,119,70460.87373737374,38583.22727272727,13030.757575757576,QwQ-32B,gpqa,10,0.01
198,136,119,70460.87373737374,37778.88383838384,12743.141414141413,QwQ-32B,gpqa,10,0.05
198,136,119,70460.87373737374,37621.166666666664,12690.29292929293,QwQ-32B,gpqa,10,0.1
198,136,119,70460.87373737374,37567.92424242424,12671.449494949495,QwQ-32B,gpqa,10,0.15
198,136,119,70460.87373737374,37541.156565656565,12655.631313131313,QwQ-32B,gpqa,10,0.2
198,136,128,70460.87373737374,47249.90404040404,17374.722222222223,QwQ-32B,gpqa,15,0.01
198,136,128,70460.87373737374,46798.80303030303,17191.303030303032,QwQ-32B,gpqa,15,0.05
198,136,128,70460.87373737374,46708.50505050505,17158.338383838385,QwQ-32B,gpqa,15,0.1
198,136,128,70460.87373737374,46681.36363636364,17152.60101010101,QwQ-32B,gpqa,15,0.15
198,136,128,70460.87373737374,46672.69191919192,17147.79797979798,QwQ-32B,gpqa,15,0.2
198,136,129,70460.87373737374,52630.69696969697,20050.555555555555,QwQ-32B,gpqa,20,0.01
198,136,129,70460.87373737374,52318.5101010101,19911.252525252527,QwQ-32B,gpqa,20,0.05
198,136,129,70460.87373737374,52248.767676767675,19883.378787878788,QwQ-32B,gpqa,20,0.1
198,136,129,70460.87373737374,52241.62626262626,19881.227272727272,QwQ-32B,gpqa,20,0.15
198,136,129,70460.87373737374,52232.93434343435,19876.29797979798,QwQ-32B,gpqa,20,0.2
500,338,286,40743.334,14394.944,6068.278,QwQ-32B,math,3,0.01
500,338,283,40743.334,13684.09,5774.784,QwQ-32B,math,3,0.05
500,338,280,40743.334,13444.204,5675.384,QwQ-32B,math,3,0.1
500,338,279,40743.334,13367.252,5635.614,QwQ-32B,math,3,0.15
500,338,277,40743.334,13331.204,5616.302,QwQ-32B,math,3,0.2
500,338,307,40743.334,18321.78,8330.338,QwQ-32B,math,5,0.01
500,338,305,40743.334,17726.132,8068.816,QwQ-32B,math,5,0.05
500,338,305,40743.334,17605.514,8012.72,QwQ-32B,math,5,0.1
500,338,304,40743.334,17545.44,7980.46,QwQ-32B,math,5,0.15
500,338,304,40743.334,17519.098,7965.644,QwQ-32B,math,5,0.2
500,338,319,40743.334,21212.16,9999.458,QwQ-32B,math,7,0.01
500,338,319,40743.334,20705.368,9777.268,QwQ-32B,math,7,0.05
500,338,319,40743.334,20597.61,9727.138,QwQ-32B,math,7,0.1
500,338,319,40743.334,20559.092,9705.918,QwQ-32B,math,7,0.15
500,338,319,40743.334,20540.83,9695.71,QwQ-32B,math,7,0.2
500,338,329,40743.334,23827.492,11462.002,QwQ-32B,math,10,0.01
500,338,329,40743.334,23444.214,11297.402,QwQ-32B,math,10,0.05
500,338,329,40743.334,23383.746,11269.388,QwQ-32B,math,10,0.1
500,338,329,40743.334,23353.016,11250.988,QwQ-32B,math,10,0.15
500,338,329,40743.334,23341.576,11244.646,QwQ-32B,math,10,0.2
500,338,336,40743.334,27040.918,13266.494,QwQ-32B,math,15,0.01
500,338,335,40743.334,26769.872,13152.65,QwQ-32B,math,15,0.05
500,338,336,40743.334,26720.576,13130.886,QwQ-32B,math,15,0.1
500,338,336,40743.334,26705.558,13124.932,QwQ-32B,math,15,0.15
500,338,336,40743.334,26695.75,13119.288,QwQ-32B,math,15,0.2
500,338,338,40743.334,29213.486,14424.116,QwQ-32B,math,20,0.01
500,338,338,40743.334,29049.874,14352.23,QwQ-32B,math,20,0.05
500,338,338,40743.334,29008.964,14334.716,QwQ-32B,math,20,0.1
500,338,338,40743.334,28996.48,14330.482,QwQ-32B,math,20,0.15
500,338,337,40743.334,28990.544,14327.346,QwQ-32B,math,20,0.2
272,68,77,52112.92279411765,23124.90073529412,10531.558823529413,QwQ-32B,minerva,3,0.01
272,68,75,52112.92279411765,22728.014705882353,10374.011029411764,QwQ-32B,minerva,3,0.05
272,68,75,52112.92279411765,22592.198529411766,10322.411764705883,QwQ-32B,minerva,3,0.1
272,68,76,52112.92279411765,22550.683823529413,10311.599264705883,QwQ-32B,minerva,3,0.15
272,68,76,52112.92279411765,22538.996323529413,10308.176470588236,QwQ-32B,minerva,3,0.2
272,68,75,52112.92279411765,30952.95588235294,14867.007352941177,QwQ-32B,minerva,5,0.01
272,68,76,52112.92279411765,30676.253676470587,14751.48161764706,QwQ-32B,minerva,5,0.05
272,68,76,52112.92279411765,30630.529411764706,14733.275735294117,QwQ-32B,minerva,5,0.1
272,68,76,52112.92279411765,30597.683823529413,14725.496323529413,QwQ-32B,minerva,5,0.15
272,68,76,52112.92279411765,30592.875,14723.867647058823,QwQ-32B,minerva,5,0.2
272,68,78,52112.92279411765,35418.70220588235,17343.897058823528,QwQ-32B,minerva,7,0.01
272,68,78,52112.92279411765,35225.01470588235,17262.371323529413,QwQ-32B,minerva,7,0.05
272,68,78,52112.92279411765,35192.90073529412,17249.227941176472,QwQ-32B,minerva,7,0.1
272,68,78,52112.92279411765,35171.84926470588,17243.610294117647,QwQ-32B,minerva,7,0.15
272,68,78,52112.92279411765,35170.21323529412,17242.908088235294,QwQ-32B,minerva,7,0.2
272,68,77,52112.92279411765,39628.268382352944,19635.301470588234,QwQ-32B,minerva,10,0.01
272,68,77,52112.92279411765,39483.757352941175,19578.040441176472,QwQ-32B,minerva,10,0.05
272,68,78,52112.92279411765,39461.095588235294,19569.52573529412,QwQ-32B,minerva,10,0.1
272,68,78,52112.92279411765,39454.23529411765,19567.073529411766,QwQ-32B,minerva,10,0.15
272,68,78,52112.92279411765,39453.62132352941,19566.91176470588,QwQ-32B,minerva,10,0.2
272,68,80,52112.92279411765,43898.76102941176,21942.011029411766,QwQ-32B,minerva,15,0.01
272,68,79,52112.92279411765,43810.81617647059,21905.20588235294,QwQ-32B,minerva,15,0.05
272,68,79,52112.92279411765,43805.27573529412,21902.595588235294,QwQ-32B,minerva,15,0.1
272,68,79,52112.92279411765,43801.04044117647,21901.147058823528,QwQ-32B,minerva,15,0.15
272,68,79,52112.92279411765,43801.04044117647,21901.147058823528,QwQ-32B,minerva,15,0.2
272,68,79,52112.92279411765,46463.59926470588,23347.84926470588,QwQ-32B,minerva,20,0.01
272,68,79,52112.92279411765,46427.955882352944,23333.058823529413,QwQ-32B,minerva,20,0.05
272,68,79,52112.92279411765,46427.46323529412,23332.54411764706,QwQ-32B,minerva,20,0.1
272,68,79,52112.92279411765,46424.86397058824,23331.46323529412,QwQ-32B,minerva,20,0.15
272,68,79,52112.92279411765,46424.86397058824,23331.46323529412,QwQ-32B,minerva,20,0.2
675,292,199,91067.87703703703,24125.99111111111,10968.288888888888,QwQ-32B,olympiadbench,3,0.01
675,292,194,91067.87703703703,23309.55851851852,10623.61925925926,QwQ-32B,olympiadbench,3,0.05
675,292,193,91067.87703703703,22994.385185185187,10493.684444444445,QwQ-32B,olympiadbench,3,0.1
675,292,193,91067.87703703703,22890.56148148148,10450.885925925926,QwQ-32B,olympiadbench,3,0.15
675,292,192,91067.87703703703,22836.253333333334,10428.717037037037,QwQ-32B,olympiadbench,3,0.2
675,292,238,91067.87703703703,34311.915555555555,17025.47703703704,QwQ-32B,olympiadbench,5,0.01
675,292,237,91067.87703703703,33644.65037037037,16739.265185185184,QwQ-32B,olympiadbench,5,0.05
675,292,236,91067.87703703703,33473.05925925926,16658.022222222222,QwQ-32B,olympiadbench,5,0.1
675,292,236,91067.87703703703,33377.30074074074,16614.866666666665,QwQ-32B,olympiadbench,5,0.15
675,292,236,91067.87703703703,33342.299259259256,16598.517037037036,QwQ-32B,olympiadbench,5,0.2
675,292,259,91067.87703703703,41606.917037037034,21791.10814814815,QwQ-32B,olympiadbench,7,0.01
675,292,260,91067.87703703703,40994.73925925926,21530.032592592594,QwQ-32B,olympiadbench,7,0.05
675,292,260,91067.87703703703,40846.9762962963,21453.634074074074,QwQ-32B,olympiadbench,7,0.1
675,292,260,91067.87703703703,40781.311111111114,21418.17925925926,QwQ-32B,olympiadbench,7,0.15
675,292,260,91067.87703703703,40750.41185185185,21403.205925925926,QwQ-32B,olympiadbench,7,0.2
675,292,271,91067.87703703703,48948.02222222222,26364.98222222222,QwQ-32B,olympiadbench,10,0.01
675,292,271,91067.87703703703,48400.37777777778,26118.508148148147,QwQ-32B,olympiadbench,10,0.05
675,292,271,91067.87703703703,48246.537777777776,26039.61037037037,QwQ-32B,olympiadbench,10,0.1
675,292,271,91067.87703703703,48202.62222222222,26018.097777777777,QwQ-32B,olympiadbench,10,0.15
675,292,271,91067.87703703703,48178.90814814815,26005.66222222222,QwQ-32B,olympiadbench,10,0.2
675,292,289,91067.87703703703,56547.14666666667,31288.493333333332,QwQ-32B,olympiadbench,15,0.01
675,292,289,91067.87703703703,56199.724444444444,31117.539259259258,QwQ-32B,olympiadbench,15,0.05
675,292,289,91067.87703703703,56094.61037037037,31059.331851851854,QwQ-32B,olympiadbench,15,0.1
675,292,288,91067.87703703703,56067.48148148148,31043.908148148148,QwQ-32B,olympiadbench,15,0.15
675,292,288,91067.87703703703,56018.567407407405,31010.971851851853,QwQ-32B,olympiadbench,15,0.2
675,292,296,91067.87703703703,62016.82518518518,34695.88888888889,QwQ-32B,olympiadbench,20,0.01
675,292,295,91067.87703703703,61720.39555555556,34548.90962962963,QwQ-32B,olympiadbench,20,0.05
675,292,295,91067.87703703703,61648.55851851852,34508.90666666667,QwQ-32B,olympiadbench,20,0.1
675,292,295,91067.87703703703,61629.81925925926,34497.142222222225,QwQ-32B,olympiadbench,20,0.15
675,292,295,91067.87703703703,61611.9362962963,34489.60296296296,QwQ-32B,olympiadbench,20,0.2
30,24,10,146374.93333333332,50095.13333333333,13222.833333333334,Qwen3-8B,aime,3,0.01
30,24,9,146374.93333333332,48949.833333333336,13032.866666666667,Qwen3-8B,aime,3,0.05
30,24,9,146374.93333333332,48683.2,12975.5,Qwen3-8B,aime,3,0.1
30,24,9,146374.93333333332,48632.566666666666,12968.1,Qwen3-8B,aime,3,0.15
30,24,9,146374.93333333332,48609.833333333336,12966.1,Qwen3-8B,aime,3,0.2
30,24,13,146374.93333333332,62224.76666666667,18206.533333333333,Qwen3-8B,aime,5,0.01
30,24,13,146374.93333333332,61276.9,18012.633333333335,Qwen3-8B,aime,5,0.05
30,24,13,146374.93333333332,61093.066666666666,17964.466666666667,Qwen3-8B,aime,5,0.1
30,24,13,146374.93333333332,61059.5,17959.3,Qwen3-8B,aime,5,0.15
30,24,13,146374.93333333332,61056.63333333333,17958.966666666667,Qwen3-8B,aime,5,0.2
30,24,14,146374.93333333332,67986.06666666667,19722.2,Qwen3-8B,aime,7,0.01
30,24,14,146374.93333333332,67171.63333333333,19568.866666666665,Qwen3-8B,aime,7,0.05
30,24,14,146374.93333333332,67107.9,19551.2,Qwen3-8B,aime,7,0.1
30,24,14,146374.93333333332,67083.6,19547.7,Qwen3-8B,aime,7,0.15
30,24,14,146374.93333333332,67083.6,19547.7,Qwen3-8B,aime,7,0.2
30,24,14,146374.93333333332,75577.53333333334,21830.866666666665,Qwen3-8B,aime,10,0.01
30,24,14,146374.93333333332,74977.5,21733.6,Qwen3-8B,aime,10,0.05
30,24,14,146374.93333333332,74809.26666666666,21680.4,Qwen3-8B,aime,10,0.1
30,24,14,146374.93333333332,74805.43333333333,21679.4,Qwen3-8B,aime,10,0.15
30,24,14,146374.93333333332,74805.43333333333,21679.4,Qwen3-8B,aime,10,0.2
30,24,19,146374.93333333332,86527.2,25267.0,Qwen3-8B,aime,15,0.01
30,24,19,146374.93333333332,85972.76666666666,25197.766666666666,Qwen3-8B,aime,15,0.05
30,24,19,146374.93333333332,85761.23333333334,25132.233333333334,Qwen3-8B,aime,15,0.1
30,24,19,146374.93333333332,85761.23333333334,25132.233333333334,Qwen3-8B,aime,15,0.15
30,24,19,146374.93333333332,85761.23333333334,25132.233333333334,Qwen3-8B,aime,15,0.2
30,24,19,146374.93333333332,93516.13333333333,27167.333333333332,Qwen3-8B,aime,20,0.01
30,24,19,146374.93333333332,92942.53333333334,27096.8,Qwen3-8B,aime,20,0.05
30,24,19,146374.93333333332,92935.86666666667,27095.133333333335,Qwen3-8B,aime,20,0.1
30,24,19,146374.93333333332,92935.86666666667,27095.133333333335,Qwen3-8B,aime,20,0.15
30,24,19,146374.93333333332,92935.86666666667,27095.133333333335,Qwen3-8B,aime,20,0.2
40,0,0,81612.25,27371.375,7072.025,Qwen3-8B,amc,3,0.01
40,0,0,81612.25,26281.175,6770.625,Qwen3-8B,amc,3,0.05
40,0,0,81612.25,25833.4,6643.25,Qwen3-8B,amc,3,0.1
40,0,0,81612.25,25633.125,6587.325,Qwen3-8B,amc,3,0.15
40,0,0,81612.25,25427.125,6536.825,Qwen3-8B,amc,3,0.2
40,0,0,81612.25,32964.2,9016.6,Qwen3-8B,amc,5,0.01
40,0,0,81612.25,31596.075,8631.875,Qwen3-8B,amc,5,0.05
40,0,0,81612.25,31082.9,8493.825,Qwen3-8B,amc,5,0.1
40,0,0,81612.25,30845.0,8430.525,Qwen3-8B,amc,5,0.15
40,0,0,81612.25,30637.55,8381.0,Qwen3-8B,amc,5,0.2
40,0,0,81612.25,35440.725,9865.675,Qwen3-8B,amc,7,0.01
40,0,0,81612.25,34111.85,9510.65,Qwen3-8B,amc,7,0.05
40,0,0,81612.25,33610.125,9376.725,Qwen3-8B,amc,7,0.1
40,0,0,81612.25,33352.3,9315.475,Qwen3-8B,amc,7,0.15
40,0,0,81612.25,33152.325,9262.925,Qwen3-8B,amc,7,0.2
40,0,0,81612.25,38935.4,11136.4,Qwen3-8B,amc,10,0.01
40,0,0,81612.25,37957.725,10910.25,Qwen3-8B,amc,10,0.05
40,0,0,81612.25,37428.2,10770.125,Qwen3-8B,amc,10,0.1
40,0,0,81612.25,37213.875,10717.275,Qwen3-8B,amc,10,0.15
40,0,0,81612.25,37072.825,10670.55,Qwen3-8B,amc,10,0.2
40,0,0,81612.25,42772.725,12248.375,Qwen3-8B,amc,15,0.01
40,0,0,81612.25,41935.8,12065.95,Qwen3-8B,amc,15,0.05
40,0,0,81612.25,41596.15,11984.15,Qwen3-8B,amc,15,0.1
40,0,0,81612.25,41471.9,11961.05,Qwen3-8B,amc,15,0.15
40,0,0,81612.25,41431.025,11948.575,Qwen3-8B,amc,15,0.2
40,0,0,81612.25,45660.575,12995.6,Qwen3-8B,amc,20,0.01
40,0,0,81612.25,44850.225,12820.3,Qwen3-8B,amc,20,0.05
40,0,0,81612.25,44533.95,12742.875,Qwen3-8B,amc,20,0.1
40,0,0,81612.25,44435.275,12727.65,Qwen3-8B,amc,20,0.15
40,0,0,81612.25,44419.5,12723.875,Qwen3-8B,amc,20,0.2
198,113,94,79275.60606060606,32862.22727272727,1894.5353535353536,Qwen3-8B,gpqa,3,0.01
198,113,93,79275.60606060606,31167.737373737375,1776.0707070707072,Qwen3-8B,gpqa,3,0.05
198,113,92,79275.60606060606,30416.28787878788,1719.489898989899,Qwen3-8B,gpqa,3,0.1
198,113,92,79275.60606060606,30044.70202020202,1694.7828282828282,Qwen3-8B,gpqa,3,0.15
198,113,91,79275.60606060606,29822.343434343435,1678.9747474747476,Qwen3-8B,gpqa,3,0.2
198,113,100,79275.60606060606,37512.530303030304,2182.989898989899,Qwen3-8B,gpqa,5,0.01
198,113,97,79275.60606060606,35754.69696969697,2046.6515151515152,Qwen3-8B,gpqa,5,0.05
198,113,96,79275.60606060606,35054.18686868687,1990.671717171717,Qwen3-8B,gpqa,5,0.1
198,113,95,79275.60606060606,34746.80808080808,1967.8383838383838,Qwen3-8B,gpqa,5,0.15
198,113,95,79275.60606060606,34577.03535353536,1954.4040404040404,Qwen3-8B,gpqa,5,0.2
198,113,105,79275.60606060606,41561.954545454544,2465.217171717172,Qwen3-8B,gpqa,7,0.01
198,113,103,79275.60606060606,39918.444444444445,2327.0757575757575,Qwen3-8B,gpqa,7,0.05
198,113,103,79275.60606060606,39257.333333333336,2267.5858585858587,Qwen3-8B,gpqa,7,0.1
198,113,102,79275.60606060606,39004.98484848485,2249.6414141414143,Qwen3-8B,gpqa,7,0.15
198,113,101,79275.60606060606,38868.26262626263,2236.7272727272725,Qwen3-8B,gpqa,7,0.2
198,113,111,79275.60606060606,46650.520202020205,2846.919191919192,Qwen3-8B,gpqa,10,0.01
198,113,106,79275.60606060606,45096.11616161616,2716.949494949495,Qwen3-8B,gpqa,10,0.05
198,113,105,79275.60606060606,44499.47474747475,2661.8232323232323,Qwen3-8B,gpqa,10,0.1
198,113,105,79275.60606060606,44317.5,2646.8333333333335,Qwen3-8B,gpqa,10,0.15
198,113,104,79275.60606060606,44186.09090909091,2634.2676767676767,Qwen3-8B,gpqa,10,0.2
198,113,110,79275.60606060606,52882.368686868685,3365.121212121212,Qwen3-8B,gpqa,15,0.01
198,113,109,79275.60606060606,51488.080808080806,3237.212121212121,Qwen3-8B,gpqa,15,0.05
198,113,108,79275.60606060606,50941.621212121216,3185.919191919192,Qwen3-8B,gpqa,15,0.1
198,113,108,79275.60606060606,50780.5505050505,3171.378787878788,Qwen3-8B,gpqa,15,0.15
198,113,107,79275.60606060606,50685.969696969696,3160.8939393939395,Qwen3-8B,gpqa,15,0.2
198,113,113,79275.60606060606,57127.57070707071,3744.277777777778,Qwen3-8B,gpqa,20,0.01
198,113,112,79275.60606060606,55888.9898989899,3622.5151515151515,Qwen3-8B,gpqa,20,0.05
198,113,112,79275.60606060606,55323.27777777778,3568.020202020202,Qwen3-8B,gpqa,20,0.1
198,113,112,79275.60606060606,55193.33838383838,3556.343434343434,Qwen3-8B,gpqa,20,0.15
198,113,112,79275.60606060606,55087.242424242424,3546.7676767676767,Qwen3-8B,gpqa,20,0.2
500,347,284,48169.754,17299.216,3805.148,Qwen3-8B,math,3,0.01
500,347,273,48169.754,15943.16,3496.638,Qwen3-8B,math,3,0.05
500,347,266,48169.754,15480.676,3387.536,Qwen3-8B,math,3,0.1
500,347,265,48169.754,15310.504,3346.092,Qwen3-8B,math,3,0.15
500,347,264,48169.754,15215.368,3321.176,Qwen3-8B,math,3,0.2
500,347,305,48169.754,19964.11,4701.0,Qwen3-8B,math,5,0.01
500,347,298,48169.754,18704.282,4408.3,Qwen3-8B,math,5,0.05
500,347,294,48169.754,18334.042,4322.01,Qwen3-8B,math,5,0.1
500,347,294,48169.754,18175.96,4282.378,Qwen3-8B,math,5,0.15
500,347,294,48169.754,18118.766,4267.526,Qwen3-8B,math,5,0.2
500,347,319,48169.754,21928.746,5314.466,Qwen3-8B,math,7,0.01
500,347,314,48169.754,20804.56,5061.656,Qwen3-8B,math,7,0.05
500,347,310,48169.754,20463.692,4982.3,Qwen3-8B,math,7,0.1
500,347,310,48169.754,20313.79,4946.02,Qwen3-8B,math,7,0.15
500,347,311,48169.754,20258.128,4931.414,Qwen3-8B,math,7,0.2
500,347,325,48169.754,24356.066,6043.948,Qwen3-8B,math,10,0.01
500,347,322,48169.754,23369.04,5833.376,Qwen3-8B,math,10,0.05
500,347,322,48169.754,23093.926,5768.878,Qwen3-8B,math,10,0.1
500,347,322,48169.754,22977.844,5742.804,Qwen3-8B,math,10,0.15
500,347,322,48169.754,22930.81,5728.818,Qwen3-8B,math,10,0.2
500,347,330,48169.754,27600.186,6959.69,Qwen3-8B,math,15,0.01
500,347,331,48169.754,26789.52,6786.398,Qwen3-8B,math,15,0.05
500,347,331,48169.754,26584.876,6742.612,Qwen3-8B,math,15,0.1
500,347,331,48169.754,26533.946,6729.386,Qwen3-8B,math,15,0.15
500,347,331,48169.754,26520.092,6725.716,Qwen3-8B,math,15,0.2
500,347,334,48169.754,30371.614,7778.01,Qwen3-8B,math,20,0.01
500,347,334,48169.754,29721.408,7641.092,Qwen3-8B,math,20,0.05
500,347,334,48169.754,29556.362,7604.908,Qwen3-8B,math,20,0.1
500,347,333,48169.754,29509.036,7591.76,Qwen3-8B,math,20,0.15
500,347,333,48169.754,29497.64,7588.81,Qwen3-8B,math,20,0.2
272,76,59,60050.533088235294,17553.029411764706,4538.261029411765,Qwen3-8B,minerva,3,0.01
272,76,59,60050.533088235294,16536.930147058825,4241.0625,Qwen3-8B,minerva,3,0.05
272,76,58,60050.533088235294,16227.27205882353,4150.952205882353,Qwen3-8B,minerva,3,0.1
272,76,58,60050.533088235294,16060.507352941177,4103.974264705882,Qwen3-8B,minerva,3,0.15
272,76,57,60050.533088235294,15984.985294117647,4085.422794117647,Qwen3-8B,minerva,3,0.2
272,76,65,60050.533088235294,22690.110294117647,6682.533088235294,Qwen3-8B,minerva,5,0.01
272,76,65,60050.533088235294,21769.71323529412,6374.676470588235,Qwen3-8B,minerva,5,0.05
272,76,65,60050.533088235294,21613.720588235294,6324.827205882353,Qwen3-8B,minerva,5,0.1
272,76,64,60050.533088235294,21497.488970588234,6294.5,Qwen3-8B,minerva,5,0.15
272,76,63,60050.533088235294,21452.488970588234,6284.283088235294,Qwen3-8B,minerva,5,0.2
272,76,70,60050.533088235294,26662.96323529412,8304.841911764706,Qwen3-8B,minerva,7,0.01
272,76,70,60050.533088235294,25968.819852941175,8103.731617647059,Qwen3-8B,minerva,7,0.05
272,76,70,60050.533088235294,25835.801470588234,8062.169117647059,Qwen3-8B,minerva,7,0.1
272,76,67,60050.533088235294,25709.26838235294,8028.511029411765,Qwen3-8B,minerva,7,0.15
272,76,67,60050.533088235294,25671.547794117647,8020.176470588235,Qwen3-8B,minerva,7,0.2
272,76,69,60050.533088235294,31162.35661764706,10162.1875,Qwen3-8B,minerva,10,0.01
272,76,69,60050.533088235294,30705.305147058825,10029.76838235294,Qwen3-8B,minerva,10,0.05
272,76,70,60050.533088235294,30582.40073529412,9987.904411764706,Qwen3-8B,minerva,10,0.1
272,76,69,60050.533088235294,30502.926470588234,9968.551470588236,Qwen3-8B,minerva,10,0.15
272,76,69,60050.533088235294,30479.867647058825,9964.194852941177,Qwen3-8B,minerva,10,0.2
272,76,70,60050.533088235294,35957.58455882353,12062.345588235294,Qwen3-8B,minerva,15,0.01
272,76,71,60050.533088235294,35586.242647058825,11945.158088235294,Qwen3-8B,minerva,15,0.05
272,76,71,60050.533088235294,35487.74632352941,11910.35294117647,Qwen3-8B,minerva,15,0.1
272,76,70,60050.533088235294,35430.57352941176,11886.23161764706,Qwen3-8B,minerva,15,0.15
272,76,70,60050.533088235294,35408.169117647056,11881.930147058823,Qwen3-8B,minerva,15,0.2
272,76,71,60050.533088235294,39261.492647058825,13357.422794117647,Qwen3-8B,minerva,20,0.01
272,76,71,60050.533088235294,38989.36029411765,13266.73161764706,Qwen3-8B,minerva,20,0.05
272,76,71,60050.533088235294,38920.819852941175,13243.757352941177,Qwen3-8B,minerva,20,0.1
272,76,71,60050.533088235294,38888.16544117647,13235.39705882353,Qwen3-8B,minerva,20,0.15
272,76,71,60050.533088235294,38878.029411764706,13234.224264705883,Qwen3-8B,minerva,20,0.2
675,295,176,104996.08444444444,25055.783703703702,6371.637037037037,Qwen3-8B,olympiadbench,3,0.01
675,295,162,104996.08444444444,23081.42962962963,5888.488888888889,Qwen3-8B,olympiadbench,3,0.05
675,295,161,104996.08444444444,22431.65037037037,5713.788148148148,Qwen3-8B,olympiadbench,3,0.1
675,295,157,104996.08444444444,22111.094814814816,5632.77037037037,Qwen3-8B,olympiadbench,3,0.15
675,295,156,104996.08444444444,21928.91259259259,5579.432592592592,Qwen3-8B,olympiadbench,3,0.2
675,295,214,104996.08444444444,32294.875555555554,8880.804444444444,Qwen3-8B,olympiadbench,5,0.01
675,295,207,104996.08444444444,30449.574814814816,8415.66962962963,Qwen3-8B,olympiadbench,5,0.05
675,295,206,104996.08444444444,29971.757037037038,8292.994074074075,Qwen3-8B,olympiadbench,5,0.1
675,295,205,104996.08444444444,29703.277037037038,8216.451851851853,Qwen3-8B,olympiadbench,5,0.15
675,295,205,104996.08444444444,29576.804444444446,8178.091851851852,Qwen3-8B,olympiadbench,5,0.2
675,295,237,104996.08444444444,38540.74518518519,10969.595555555556,Qwen3-8B,olympiadbench,7,0.01
675,295,229,104996.08444444444,36791.56,10540.685925925925,Qwen3-8B,olympiadbench,7,0.05
675,295,227,104996.08444444444,36358.97037037037,10431.25925925926,Qwen3-8B,olympiadbench,7,0.1
675,295,225,104996.08444444444,36136.73037037037,10368.705185185185,Qwen3-8B,olympiadbench,7,0.15
675,295,225,104996.08444444444,36038.537777777776,10340.475555555555,Qwen3-8B,olympiadbench,7,0.2
675,295,251,104996.08444444444,45873.98814814815,13421.303703703703,Qwen3-8B,olympiadbench,10,0.01
675,295,250,104996.08444444444,44378.53333333333,13037.774814814815,Qwen3-8B,olympiadbench,10,0.05
675,295,249,104996.08444444444,43924.69925925926,12904.474074074074,Qwen3-8B,olympiadbench,10,0.1
675,295,249,104996.08444444444,43751.17333333333,12851.118518518519,Qwen3-8B,olympiadbench,10,0.15
675,295,249,104996.08444444444,43670.98814814815,12824.85037037037,Qwen3-8B,olympiadbench,10,0.2
675,295,261,104996.08444444444,54756.26814814815,16370.22074074074,Qwen3-8B,olympiadbench,15,0.01
675,295,259,104996.08444444444,53528.38518518519,16038.16888888889,Qwen3-8B,olympiadbench,15,0.05
675,295,259,104996.08444444444,53170.49481481482,15938.493333333334,Qwen3-8B,olympiadbench,15,0.1
675,295,259,104996.08444444444,53022.83851851852,15887.217777777778,Qwen3-8B,olympiadbench,15,0.15
675,295,259,104996.08444444444,52961.845925925925,15867.706666666667,Qwen3-8B,olympiadbench,15,0.2
675,295,274,104996.08444444444,61594.14074074074,18652.103703703702,Qwen3-8B,olympiadbench,20,0.01
675,295,274,104996.08444444444,60525.207407407404,18350.04,Qwen3-8B,olympiadbench,20,0.05
675,295,274,104996.08444444444,60226.69037037037,18271.134814814814,Qwen3-8B,olympiadbench,20,0.1
675,295,274,104996.08444444444,60089.137777777774,18220.50222222222,Qwen3-8B,olympiadbench,20,0.15
675,295,274,104996.08444444444,60044.36148148148,18206.773333333334,Qwen3-8B,olympiadbench,20,0.2
