best_bvv_zh Total parameters:     0.5B
best_bvv_zh MMLU [high_school_european_history]: 12.06% ± 1.11% (σ=1.79%)
best_bvv_zh MMLU [business_ethics]: 18.70% ± 1.80% (σ=2.90%)
best_bvv_zh MMLU [clinical_knowledge]: 22.98% ± 1.52% (σ=2.45%)
best_bvv_zh MMLU [medical_genetics]: 21.50% ± 2.32% (σ=3.75%)
best_bvv_zh MMLU [high_school_us_history]: 13.48% ± 1.33% (σ=2.14%)
best_bvv_zh MMLU [high_school_physics]: 17.35% ± 1.41% (σ=2.27%)
best_bvv_zh MMLU [high_school_world_history]: 13.00% ± 0.90% (σ=1.46%)
best_bvv_zh MMLU [virology]: 23.98% ± 1.54% (σ=2.48%)
best_bvv_zh MMLU [high_school_microeconomics]: 21.30% ± 0.72% (σ=1.16%)
best_bvv_zh MMLU [econometrics]: 19.39% ± 1.98% (σ=3.20%)
best_bvv_zh MMLU [college_computer_science]: 18.10% ± 2.85% (σ=4.59%)
best_bvv_zh MMLU [high_school_biology]: 23.00% ± 1.15% (σ=1.86%)
best_bvv_zh MMLU [abstract_algebra]: 13.70% ± 2.35% (σ=3.80%)
best_bvv_zh MMLU [professional_accounting]: 20.85% ± 0.94% (σ=1.51%)
best_bvv_zh MMLU [philosophy]: 19.49% ± 0.79% (σ=1.28%)
best_bvv_zh MMLU [professional_medicine]: 20.66% ± 0.93% (σ=1.50%)
best_bvv_zh MMLU [nutrition]: 20.16% ± 1.27% (σ=2.05%)
best_bvv_zh MMLU [global_facts]: 21.70% ± 2.50% (σ=4.03%)
best_bvv_zh MMLU [machine_learning]: 19.29% ± 1.36% (σ=2.19%)
best_bvv_zh MMLU [security_studies]: 16.61% ± 0.88% (σ=1.43%)
best_bvv_zh MMLU [public_relations]: 21.64% ± 1.93% (σ=3.12%)
best_bvv_zh MMLU [professional_psychology]: 19.93% ± 0.97% (σ=1.57%)
best_bvv_zh MMLU [prehistory]: 20.43% ± 1.15% (σ=1.86%)
best_bvv_zh MMLU [anatomy]: 20.22% ± 0.96% (σ=1.56%)
best_bvv_zh MMLU [human_sexuality]: 18.47% ± 2.15% (σ=3.46%)
best_bvv_zh MMLU [college_medicine]: 20.98% ± 1.24% (σ=2.00%)
best_bvv_zh MMLU [high_school_government_and_politics]: 23.21% ± 1.31% (σ=2.12%)
best_bvv_zh MMLU [college_chemistry]: 20.70% ± 2.20% (σ=3.55%)
best_bvv_zh MMLU [logical_fallacies]: 13.80% ± 1.55% (σ=2.50%)
best_bvv_zh MMLU [high_school_geography]: 24.34% ± 1.14% (σ=1.83%)
best_bvv_zh MMLU [elementary_mathematics]: 15.56% ± 0.89% (σ=1.43%)
best_bvv_zh MMLU [human_aging]: 21.12% ± 1.09% (σ=1.75%)
best_bvv_zh MMLU [college_mathematics]: 15.70% ± 2.18% (σ=3.52%)
best_bvv_zh MMLU [high_school_psychology]: 20.81% ± 0.69% (σ=1.11%)
best_bvv_zh MMLU [formal_logic]: 16.83% ± 1.35% (σ=2.18%)
best_bvv_zh MMLU [high_school_statistics]: 21.85% ± 1.65% (σ=2.66%)
best_bvv_zh MMLU [international_law]: 13.14% ± 1.18% (σ=1.90%)
best_bvv_zh MMLU [high_school_mathematics]: 16.85% ± 0.85% (σ=1.37%)
best_bvv_zh MMLU [high_school_computer_science]: 15.70% ± 1.39% (σ=2.24%)
best_bvv_zh MMLU [conceptual_physics]: 18.26% ± 1.35% (σ=2.17%)
best_bvv_zh MMLU [miscellaneous]: 16.41% ± 0.38% (σ=0.62%)
best_bvv_zh MMLU [high_school_chemistry]: 19.36% ± 1.74% (σ=2.80%)
best_bvv_zh MMLU [marketing]: 21.84% ± 0.91% (σ=1.46%)
best_bvv_zh MMLU [professional_law]: 19.24% ± 0.35% (σ=0.56%)
best_bvv_zh MMLU [management]: 22.23% ± 2.16% (σ=3.49%)
best_bvv_zh MMLU [college_physics]: 15.69% ± 2.38% (σ=3.85%)
best_bvv_zh MMLU [jurisprudence]: 19.54% ± 1.84% (σ=2.97%)
best_bvv_zh MMLU [world_religions]: 13.80% ± 1.47% (σ=2.37%)
best_bvv_zh MMLU [sociology]: 19.10% ± 1.16% (σ=1.88%)
best_bvv_zh MMLU [us_foreign_policy]: 16.60% ± 1.92% (σ=3.10%)
best_bvv_zh MMLU [high_school_macroeconomics]: 22.59% ± 0.78% (σ=1.25%)
best_bvv_zh MMLU [computer_security]: 15.10% ± 1.70% (σ=2.74%)
best_bvv_zh MMLU [moral_scenarios]: 21.39% ± 0.77% (σ=1.24%)
best_bvv_zh MMLU [moral_disputes]: 19.31% ± 0.41% (σ=0.67%)
best_bvv_zh MMLU [electrical_engineering]: 16.48% ± 1.44% (σ=2.32%)
best_bvv_zh MMLU [astronomy]: 20.33% ± 1.51% (σ=2.43%)
best_bvv_zh MMLU [college_biology]: 21.04% ± 1.43% (σ=2.30%)
best_bvv_zh MMLU: 19.42% ± 0.21% (σ=0.33%)
best_bvv_zh ARC-e: 19.51% ± 0.88% (σ=1.43%)
best_bvv_zh ARC-c: 21.94% ± 1.13% (σ=1.83%)
best_bvv_zh C-SENSE: 19.62% ± 0.74% (σ=1.19%)
best_bvv_zh SQUAD: 15.12% ± 1.24% (σ=2.00%)
best_bvv_zh BLEU [en-ru]: 3.15% ± 0.20% (σ=0.32%)
best_bvv_zh BLEU [ru-en]: 3.04% ± 0.34% (σ=0.55%)
best_bvv_zh BLEU [en-zh]: 1.41% ± 0.25% (σ=0.40%)
best_bvv_zh BLEU [zh-en]: 7.78% ± 0.40% (σ=0.64%)
