abs-bvv-4 Total parameters:     1.9B
abs-bvv-4 MMLU [high_school_european_history]: 14.18% ± 1.71% (σ=2.77%)
abs-bvv-4 MMLU [business_ethics]: 17.80% ± 1.83% (σ=2.96%)
abs-bvv-4 MMLU [clinical_knowledge]: 26.94% ± 0.86% (σ=1.38%)
abs-bvv-4 MMLU [medical_genetics]: 23.80% ± 1.96% (σ=3.16%)
abs-bvv-4 MMLU [high_school_us_history]: 14.66% ± 0.86% (σ=1.39%)
abs-bvv-4 MMLU [high_school_physics]: 20.99% ± 1.07% (σ=1.73%)
abs-bvv-4 MMLU [high_school_world_history]: 13.67% ± 1.10% (σ=1.77%)
abs-bvv-4 MMLU [virology]: 21.02% ± 1.23% (σ=1.99%)
abs-bvv-4 MMLU [high_school_microeconomics]: 27.39% ± 1.65% (σ=2.66%)
abs-bvv-4 MMLU [econometrics]: 21.75% ± 2.00% (σ=3.23%)
abs-bvv-4 MMLU [college_computer_science]: 22.60% ± 2.04% (σ=3.29%)
abs-bvv-4 MMLU [high_school_biology]: 24.74% ± 1.17% (σ=1.88%)
abs-bvv-4 MMLU [abstract_algebra]: 10.60% ± 1.31% (σ=2.11%)
abs-bvv-4 MMLU [professional_accounting]: 19.47% ± 0.95% (σ=1.53%)
abs-bvv-4 MMLU [philosophy]: 22.12% ± 0.94% (σ=1.52%)
abs-bvv-4 MMLU [professional_medicine]: 33.71% ± 1.08% (σ=1.75%)
abs-bvv-4 MMLU [nutrition]: 24.05% ± 0.80% (σ=1.28%)
abs-bvv-4 MMLU [global_facts]: 16.10% ± 1.91% (σ=3.08%)
abs-bvv-4 MMLU [machine_learning]: 12.59% ± 1.27% (σ=2.05%)
abs-bvv-4 MMLU [security_studies]: 20.98% ± 1.06% (σ=1.70%)
abs-bvv-4 MMLU [public_relations]: 24.36% ± 1.44% (σ=2.33%)
abs-bvv-4 MMLU [professional_psychology]: 19.51% ± 0.64% (σ=1.04%)
abs-bvv-4 MMLU [prehistory]: 21.30% ± 1.22% (σ=1.98%)
abs-bvv-4 MMLU [anatomy]: 23.26% ± 1.98% (σ=3.20%)
abs-bvv-4 MMLU [human_sexuality]: 25.80% ± 2.30% (σ=3.71%)
abs-bvv-4 MMLU [college_medicine]: 25.95% ± 2.39% (σ=3.86%)
abs-bvv-4 MMLU [high_school_government_and_politics]: 23.26% ± 1.63% (σ=2.64%)
abs-bvv-4 MMLU [college_chemistry]: 27.70% ± 2.53% (σ=4.08%)
abs-bvv-4 MMLU [logical_fallacies]: 18.34% ± 0.94% (σ=1.51%)
abs-bvv-4 MMLU [high_school_geography]: 27.88% ± 1.53% (σ=2.46%)
abs-bvv-4 MMLU [elementary_mathematics]: 20.71% ± 0.71% (σ=1.15%)
abs-bvv-4 MMLU [human_aging]: 19.78% ± 1.07% (σ=1.72%)
abs-bvv-4 MMLU [college_mathematics]: 21.80% ± 1.83% (σ=2.96%)
abs-bvv-4 MMLU [high_school_psychology]: 27.30% ± 1.48% (σ=2.38%)
abs-bvv-4 MMLU [formal_logic]: 23.10% ± 1.91% (σ=3.08%)
abs-bvv-4 MMLU [high_school_statistics]: 23.61% ± 1.12% (σ=1.80%)
abs-bvv-4 MMLU [international_law]: 8.26% ± 0.94% (σ=1.52%)
abs-bvv-4 MMLU [high_school_mathematics]: 19.67% ± 1.07% (σ=1.72%)
abs-bvv-4 MMLU [high_school_computer_science]: 13.00% ± 2.00% (σ=3.22%)
abs-bvv-4 MMLU [conceptual_physics]: 25.62% ± 1.46% (σ=2.35%)
abs-bvv-4 MMLU [miscellaneous]: 19.31% ± 0.40% (σ=0.65%)
abs-bvv-4 MMLU [high_school_chemistry]: 23.40% ± 0.82% (σ=1.33%)
abs-bvv-4 MMLU [marketing]: 20.94% ± 1.10% (σ=1.78%)
abs-bvv-4 MMLU [professional_law]: 12.64% ± 0.20% (σ=0.32%)
abs-bvv-4 MMLU [management]: 29.13% ± 2.60% (σ=4.19%)
abs-bvv-4 MMLU [college_physics]: 25.69% ± 1.88% (σ=3.03%)
abs-bvv-4 MMLU [jurisprudence]: 20.93% ± 1.99% (σ=3.21%)
abs-bvv-4 MMLU [world_religions]: 16.61% ± 1.34% (σ=2.16%)
abs-bvv-4 MMLU [sociology]: 21.89% ± 1.23% (σ=1.99%)
abs-bvv-4 MMLU [us_foreign_policy]: 18.80% ± 1.61% (σ=2.60%)
abs-bvv-4 MMLU [high_school_macroeconomics]: 27.54% ± 1.44% (σ=2.33%)
abs-bvv-4 MMLU [computer_security]: 17.90% ± 1.67% (σ=2.70%)
abs-bvv-4 MMLU [moral_scenarios]: 23.36% ± 0.65% (σ=1.05%)
abs-bvv-4 MMLU [moral_disputes]: 20.38% ± 1.29% (σ=2.08%)
abs-bvv-4 MMLU [electrical_engineering]: 20.28% ± 0.94% (σ=1.52%)
abs-bvv-4 MMLU [astronomy]: 21.97% ± 0.58% (σ=0.94%)
abs-bvv-4 MMLU [college_biology]: 20.76% ± 1.30% (σ=2.09%)
abs-bvv-4 MMLU: 21.03% ± 0.16% (σ=0.27%)
abs-bvv-4 ARC-e: 21.14% ± 0.59% (σ=0.95%)
abs-bvv-4 ARC-c: 24.78% ± 0.97% (σ=1.56%)
abs-bvv-4 C-SENSE: 19.99% ± 0.72% (σ=1.16%)
abs-bvv-4 SQUAD: 3.16% ± 0.73% (σ=1.18%)
