max_bvv_zh Total parameters:     0.4B
max_bvv_zh MMLU [high_school_european_history]: 18.30% ± 0.87% (σ=1.40%)
max_bvv_zh MMLU [business_ethics]: 21.70% ± 2.24% (σ=3.61%)
max_bvv_zh MMLU [clinical_knowledge]: 28.15% ± 1.24% (σ=2.00%)
max_bvv_zh MMLU [medical_genetics]: 23.30% ± 2.11% (σ=3.41%)
max_bvv_zh MMLU [high_school_us_history]: 19.66% ± 0.92% (σ=1.48%)
max_bvv_zh MMLU [high_school_physics]: 28.34% ± 1.45% (σ=2.35%)
max_bvv_zh MMLU [high_school_world_history]: 18.52% ± 1.06% (σ=1.71%)
max_bvv_zh MMLU [virology]: 23.49% ± 1.98% (σ=3.19%)
max_bvv_zh MMLU [high_school_microeconomics]: 31.05% ± 1.32% (σ=2.13%)
max_bvv_zh MMLU [econometrics]: 24.21% ± 2.02% (σ=3.26%)
max_bvv_zh MMLU [college_computer_science]: 27.20% ± 1.75% (σ=2.82%)
max_bvv_zh MMLU [high_school_biology]: 29.19% ± 1.60% (σ=2.59%)
max_bvv_zh MMLU [abstract_algebra]: 22.50% ± 2.04% (σ=3.29%)
max_bvv_zh MMLU [professional_accounting]: 23.33% ± 1.28% (σ=2.06%)
max_bvv_zh MMLU [philosophy]: 24.28% ± 1.53% (σ=2.46%)
max_bvv_zh MMLU [professional_medicine]: 33.31% ± 1.31% (σ=2.11%)
max_bvv_zh MMLU [nutrition]: 26.73% ± 0.92% (σ=1.49%)
max_bvv_zh MMLU [global_facts]: 19.70% ± 1.52% (σ=2.45%)
max_bvv_zh MMLU [machine_learning]: 17.77% ± 1.34% (σ=2.17%)
max_bvv_zh MMLU [security_studies]: 24.69% ± 1.15% (σ=1.85%)
max_bvv_zh MMLU [public_relations]: 23.36% ± 2.42% (σ=3.90%)
max_bvv_zh MMLU [professional_psychology]: 23.55% ± 0.95% (σ=1.53%)
max_bvv_zh MMLU [prehistory]: 23.89% ± 0.89% (σ=1.44%)
max_bvv_zh MMLU [anatomy]: 24.30% ± 1.53% (σ=2.47%)
max_bvv_zh MMLU [human_sexuality]: 27.40% ± 2.15% (σ=3.47%)
max_bvv_zh MMLU [college_medicine]: 27.28% ± 1.56% (σ=2.52%)
max_bvv_zh MMLU [high_school_government_and_politics]: 28.08% ± 1.74% (σ=2.81%)
max_bvv_zh MMLU [college_chemistry]: 35.90% ± 2.06% (σ=3.33%)
max_bvv_zh MMLU [logical_fallacies]: 24.05% ± 1.26% (σ=2.03%)
max_bvv_zh MMLU [high_school_geography]: 28.03% ± 1.60% (σ=2.59%)
max_bvv_zh MMLU [elementary_mathematics]: 25.53% ± 1.17% (σ=1.89%)
max_bvv_zh MMLU [human_aging]: 18.83% ± 1.17% (σ=1.88%)
max_bvv_zh MMLU [college_mathematics]: 28.30% ± 2.51% (σ=4.05%)
max_bvv_zh MMLU [high_school_psychology]: 31.05% ± 0.99% (σ=1.59%)
max_bvv_zh MMLU [formal_logic]: 26.59% ± 1.48% (σ=2.39%)
max_bvv_zh MMLU [high_school_statistics]: 33.47% ± 1.38% (σ=2.22%)
max_bvv_zh MMLU [international_law]: 13.06% ± 1.14% (σ=1.84%)
max_bvv_zh MMLU [high_school_mathematics]: 24.85% ± 1.30% (σ=2.09%)
max_bvv_zh MMLU [high_school_computer_science]: 18.10% ± 1.74% (σ=2.81%)
max_bvv_zh MMLU [conceptual_physics]: 22.43% ± 1.28% (σ=2.06%)
max_bvv_zh MMLU [miscellaneous]: 22.41% ± 0.67% (σ=1.09%)
max_bvv_zh MMLU [high_school_chemistry]: 25.67% ± 1.18% (σ=1.90%)
max_bvv_zh MMLU [marketing]: 22.69% ± 1.03% (σ=1.66%)
max_bvv_zh MMLU [professional_law]: 21.23% ± 0.59% (σ=0.95%)
max_bvv_zh MMLU [management]: 32.14% ± 1.69% (σ=2.73%)
max_bvv_zh MMLU [college_physics]: 28.82% ± 2.26% (σ=3.65%)
max_bvv_zh MMLU [jurisprudence]: 25.56% ± 1.85% (σ=2.99%)
max_bvv_zh MMLU [world_religions]: 19.12% ± 1.12% (σ=1.81%)
max_bvv_zh MMLU [sociology]: 24.98% ± 1.70% (σ=2.75%)
max_bvv_zh MMLU [us_foreign_policy]: 22.20% ± 1.70% (σ=2.75%)
max_bvv_zh MMLU [high_school_macroeconomics]: 30.74% ± 0.81% (σ=1.31%)
max_bvv_zh MMLU [computer_security]: 21.00% ± 2.02% (σ=3.26%)
max_bvv_zh MMLU [moral_scenarios]: 25.35% ± 0.50% (σ=0.81%)
max_bvv_zh MMLU [moral_disputes]: 21.68% ± 1.09% (σ=1.76%)
max_bvv_zh MMLU [electrical_engineering]: 23.86% ± 0.90% (σ=1.45%)
max_bvv_zh MMLU [astronomy]: 27.30% ± 1.02% (σ=1.64%)
max_bvv_zh MMLU [college_biology]: 24.65% ± 1.58% (σ=2.55%)
max_bvv_zh MMLU: 24.72% ± 0.13% (σ=0.21%)
max_bvv_zh ARC-e: 21.93% ± 0.43% (σ=0.70%)
max_bvv_zh ARC-c: 26.49% ± 1.05% (σ=1.69%)
max_bvv_zh C-SENSE: 20.21% ± 0.51% (σ=0.83%)
max_bvv_zh SQUAD: 17.66% ± 0.99% (σ=1.60%)
max_bvv_zh BLEU [en-ru]: 1.28% ± 0.22% (σ=0.35%)
max_bvv_zh BLEU [ru-en]: 3.47% ± 0.36% (σ=0.59%)
max_bvv_zh BLEU [en-zh]: 2.36% ± 0.21% (σ=0.33%)
max_bvv_zh BLEU [zh-en]: 7.44% ± 0.34% (σ=0.54%)
