,Llama2-70B FT,Llama2-13B FT,Llama2-7B FT,Mistral-7B FT,GPT-4,Llama2-70B Base,Llama2-7B Base,Llama2-13B Base,Mistral 7B
Llama2-70B HG,73.38096739168506,57.333333333333336,74.59712629991267,70.20609791231881,43.488457517447905,71.26436781609196,77.47176931091775,76.96958065444774,77.46498109277769
Llama3.1-8B HG,82.51281179413694,63.32650199546974,83.0524795377949,74.5055573982661,63.06555863342569,80.83045219594015,82.69100743745776,81.21746451962856,86.12818815221159
Llama2-13B HG,71.42833412237641,70.4032556418794,78.21350762527234,67.94153405193201,56.7567567567567,75.3205246856452,82.35894043385981,85.01030541502718,82.82647584973166
Llama3-8B HG,59.66386554621849,37.939136024115086,60.69439895185062,60.113013129466516,45.84253127299487,64.26461476390149,66.61566281819447,67.94871794871796,71.97996567545796
Mistral-7B,71.42833412237641,66.35491558688643,78.07875988413059,75.39487803377736,36.249488488294396,61.82477572055739,79.12087912087912,78.6267699706118,76.46591134963228
GPT-4 HG,90.56892933770307,87.70460308921845,92.42036647528093,81.9632953059476,76.17452162906706,83.06878306878306,92.09064125126056,91.91737065850114,92.75165352903868
Llama3-8B,64.30706029918093,51.15317365601844,69.2646827271296,69.30347153502838,17.743966543051574,65.64229814850165,76.76734152007965,73.10924369747897,60.915096543959905
Llama2-13B,19.38703728349526,35.910171696650025,75.49249457646403,38.499615622597645,25.54112554112557,43.143337318873904,72.86903550700991,65.44667955335586,61.17858534218207
Gemma-2B HG,33.448748791510226,19.94996873045654,26.560587515299893,27.012926935827913,16.943521594684448,15.428571428571406,36.862554638173876,24.571584093935584,33.98082904843519
Llama3.1-70B HG,91.7312661498708,77.48322696776304,92.90618428719819,82.71968581246931,78.23129251700684,89.57037037037038,93.09387679210577,92.29880631497882,93.79960317460318
Gemma-2B,63.84180790960452,7.494796582307742,541.6403785488958,42.12367335675234,1.4005602240894262,60.258320914058615,75.39487803377736,72.48682326782956,582.6672833147636
Llama2-70B,68.7044347238522,44.622306827031245,79.16666666666667,68.09116809116809,53.524790987850764,77.29488271079357,84.56407287962735,48.79424281216483,84.91494158639065
Mistral-7B HG,66.44025741036602,57.333333333333336,70.13333333333333,68.07594564509695,45.650697029810644,67.68397152846457,74.4098266265754,74.78196910791218,73.95833333333334
Contains,70.52915458987567,77.48297149719474,64.40188268916201,64.41972199775743,29.234851835471037,56.978600268030156,76.60461804495982,70.3162301612305,67.24533245987553
GPT-4,90.67388922134747,87.76009791921663,92.88888888888889,77.59160703827253,64.83516483516487,72.5037037037037,91.44362148720553,87.69420089217044,89.4304077841938
Llama3-70B HG,88.30561330561329,82.25423419191524,91.86247584172514,81.49995705347173,84.49311882147704,85.19231481138712,92.47129681912291,91.52756393023276,94.34135760262184
Llama2-7B,33.40625623456409,46.393934050432016,74.46308639137874,43.96535492374315,56.51878524356275,67.81842818428187,83.48857182804146,78.9186507936508,77.3809523809524
JudgeLM-7B HG,59.85663082437275,55.910719206392955,70.4032556418794,63.856510346073925,34.71582181259597,72.54901960784316,76.08969145971042,77.18044229672138,77.62863534675616
Llama2-7B HG,35.33410711324825,41.12841701604792,41.406249999999986,42.95634920634921,28.727228042149207,58.93886289979614,54.11405715858946,59.31283905967448,61.54927406117656
EM,27.476014202613886,-38.38595843808384,33.33333333333334,15.966386554621842,11.627172983164277,48.345439870863615,68.74684948079442,63.481519332311365,72.84848484848484
Llama3-70B,79.86509614416593,71.2,85.55770470664088,76.97317776840639,61.59754224270346,76.97438400220246,85.32917660003667,83.49712437392216,89.3114532493968
JudgeLM-7B,58.147029485417725,43.31410061747138,62.47161916199124,61.34901960784313,23.437499999999964,44.94774083339405,71.88839693583527,62.33601056840301,59.60257670051314
