best_bvv_moe Total parameters:     0.9B
best_bvv_moe MMLU [high_school_european_history]: 9.94% ± 1.29% (σ=2.09%)
best_bvv_moe MMLU [business_ethics]: 23.50% ± 1.42% (σ=2.29%)
best_bvv_moe MMLU [clinical_knowledge]: 27.81% ± 1.03% (σ=1.65%)
best_bvv_moe MMLU [medical_genetics]: 22.80% ± 2.49% (σ=4.02%)
best_bvv_moe MMLU [high_school_us_history]: 13.14% ± 1.31% (σ=2.11%)
best_bvv_moe MMLU [high_school_physics]: 23.84% ± 1.48% (σ=2.39%)
best_bvv_moe MMLU [high_school_world_history]: 12.66% ± 0.84% (σ=1.35%)
best_bvv_moe MMLU [virology]: 24.28% ± 1.04% (σ=1.68%)
best_bvv_moe MMLU [high_school_microeconomics]: 27.48% ± 1.79% (σ=2.88%)
best_bvv_moe MMLU [econometrics]: 24.91% ± 1.01% (σ=1.63%)
best_bvv_moe MMLU [college_computer_science]: 24.20% ± 2.46% (σ=3.97%)
best_bvv_moe MMLU [high_school_biology]: 27.71% ± 0.61% (σ=0.98%)
best_bvv_moe MMLU [abstract_algebra]: 21.80% ± 1.59% (σ=2.56%)
best_bvv_moe MMLU [professional_accounting]: 24.36% ± 1.27% (σ=2.04%)
best_bvv_moe MMLU [philosophy]: 23.38% ± 1.13% (σ=1.83%)
best_bvv_moe MMLU [professional_medicine]: 32.17% ± 1.18% (σ=1.91%)
best_bvv_moe MMLU [nutrition]: 26.31% ± 1.28% (σ=2.06%)
best_bvv_moe MMLU [global_facts]: 23.00% ± 1.57% (σ=2.53%)
best_bvv_moe MMLU [machine_learning]: 18.21% ± 2.31% (σ=3.73%)
best_bvv_moe MMLU [security_studies]: 25.63% ± 1.29% (σ=2.09%)
best_bvv_moe MMLU [public_relations]: 26.18% ± 1.23% (σ=1.98%)
best_bvv_moe MMLU [professional_psychology]: 21.34% ± 0.80% (σ=1.29%)
best_bvv_moe MMLU [prehistory]: 22.90% ± 0.87% (σ=1.40%)
best_bvv_moe MMLU [anatomy]: 22.52% ± 1.80% (σ=2.91%)
best_bvv_moe MMLU [human_sexuality]: 24.20% ± 1.83% (σ=2.96%)
best_bvv_moe MMLU [college_medicine]: 25.32% ± 2.05% (σ=3.31%)
best_bvv_moe MMLU [high_school_government_and_politics]: 29.02% ± 1.21% (σ=1.95%)
best_bvv_moe MMLU [college_chemistry]: 27.10% ± 3.12% (σ=5.03%)
best_bvv_moe MMLU [logical_fallacies]: 19.57% ± 1.09% (σ=1.77%)
best_bvv_moe MMLU [high_school_geography]: 27.88% ± 1.69% (σ=2.73%)
best_bvv_moe MMLU [elementary_mathematics]: 21.83% ± 1.17% (σ=1.88%)
best_bvv_moe MMLU [human_aging]: 20.94% ± 1.71% (σ=2.76%)
best_bvv_moe MMLU [college_mathematics]: 24.70% ± 2.56% (σ=4.12%)
best_bvv_moe MMLU [high_school_psychology]: 27.71% ± 0.99% (σ=1.60%)
best_bvv_moe MMLU [formal_logic]: 24.84% ± 2.05% (σ=3.31%)
best_bvv_moe MMLU [high_school_statistics]: 33.29% ± 1.21% (σ=1.95%)
best_bvv_moe MMLU [international_law]: 14.38% ± 1.73% (σ=2.80%)
best_bvv_moe MMLU [high_school_mathematics]: 23.22% ± 1.16% (σ=1.87%)
best_bvv_moe MMLU [high_school_computer_science]: 19.70% ± 2.43% (σ=3.93%)
best_bvv_moe MMLU [conceptual_physics]: 24.77% ± 1.44% (σ=2.32%)
best_bvv_moe MMLU [miscellaneous]: 20.73% ± 0.60% (σ=0.96%)
best_bvv_moe MMLU [high_school_chemistry]: 25.12% ± 1.70% (σ=2.74%)
best_bvv_moe MMLU [marketing]: 23.25% ± 1.20% (σ=1.93%)
best_bvv_moe MMLU [professional_law]: 18.90% ± 0.46% (σ=0.74%)
best_bvv_moe MMLU [management]: 29.42% ± 2.35% (σ=3.79%)
best_bvv_moe MMLU [college_physics]: 27.06% ± 2.34% (σ=3.78%)
best_bvv_moe MMLU [jurisprudence]: 22.41% ± 2.35% (σ=3.79%)
best_bvv_moe MMLU [world_religions]: 20.88% ± 1.17% (σ=1.89%)
best_bvv_moe MMLU [sociology]: 23.48% ± 1.09% (σ=1.76%)
best_bvv_moe MMLU [us_foreign_policy]: 20.30% ± 1.24% (σ=2.00%)
best_bvv_moe MMLU [high_school_macroeconomics]: 29.77% ± 1.20% (σ=1.93%)
best_bvv_moe MMLU [computer_security]: 18.50% ± 1.71% (σ=2.77%)
best_bvv_moe MMLU [moral_scenarios]: 24.49% ± 0.74% (σ=1.20%)
best_bvv_moe MMLU [moral_disputes]: 21.33% ± 0.84% (σ=1.35%)
best_bvv_moe MMLU [electrical_engineering]: 23.10% ± 1.41% (σ=2.27%)
best_bvv_moe MMLU [astronomy]: 26.71% ± 1.29% (σ=2.08%)
best_bvv_moe MMLU [college_biology]: 23.54% ± 1.50% (σ=2.41%)
best_bvv_moe MMLU: 23.44% ± 0.18% (σ=0.28%)
best_bvv_moe ARC-e: 23.74% ± 0.63% (σ=1.02%)
best_bvv_moe ARC-c: 25.28% ± 1.29% (σ=2.07%)
best_bvv_moe C-SENSE: 19.69% ± 0.70% (σ=1.13%)
best_bvv_moe SQUAD: 19.73% ± 0.90% (σ=1.45%)
best_bvv_moe BLEU [en-ru]: 6.52% ± 0.38% (σ=0.62%)
best_bvv_moe BLEU [ru-en]: 6.22% ± 0.23% (σ=0.38%)
best_bvv_moe BLEU [en-zh]: 2.93% ± 0.21% (σ=0.34%)
best_bvv_moe BLEU [zh-en]: 4.95% ± 0.36% (σ=0.59%)
