Keywords: Expert-Level LLM Alignment, RLHF, Preference Optimization
Abstract: Large Language Models (LLMs) demonstrate remarkable capabilities, yet aligning their behavior with human preferences remains both challenging and essential. Human evaluation of model outputs is often costly and must account for diverse user preferences. To address this, recent methods leverage LLM-as-a-judge to assess alignment quality, which achieves higher agreement with human judgments while being more cost-effective. However, existing widely used benchmarks such as AlpacaEval 2.0 primarily rely on simplistic, instruction-following data pairs derived from general user preferences. These benchmarks are insufficient for evaluating complex, domain-specific, and nuanced scenarios, and many are outdated for current alignment evaluation.
To overcome these limitations, we introduce WorldAlignment, an expert-level, multi-domain human preference benchmark designed for efficient and comprehensive evaluation of alignment capabilities. WorldAlignment provides more challenging, higher-quality, and diverse preference pairs across multiple domains, enabling more robust alignment assessment. Our evaluation results show that WorldAlignment offers comprehensive insights into both SoTa and post-trained models, establishing a modern benchmark for domain-oriented alignment. Furthermore, WorldAlignment supports evaluation across various dimensions, including instruction-following, mathematical reasoning, and code-related tasks, providing a holistic view of alignment performance.
Through our evaluation, we find that several state-of-the-art alignment-tuned models still exhibit substantial performance gaps compared to GPT-4-level models on our benchmark, highlighting critical limitations and directions for future improvement. Our code and data will be available at https://anonymous.4open.science/r/WorldAlignment.
Primary Area: datasets and benchmarks
Submission Number: 3778
Loading