MEAL: A Multi-dimensional Evaluation of Alignment Techniques for LLMs

Muneeza Azmat; Momin Abbas; Maysa Macedo; Marcelo Carpinette Grave; Luan Soares de Souza; Tiago Lemos de Araujo Machado; Rogério Abreu de Paula; Raya Horesh; Yixin Chen; Heloisa Candello; Rebecka Nordenlöw; Aminat Adebiyi

MEAL: A Multi-dimensional Evaluation of Alignment Techniques for LLMs

Muneeza Azmat, Momin Abbas, Maysa Macedo, Marcelo Carpinette Grave, Luan Soares de Souza, Tiago Lemos de Araujo Machado, Rogério Abreu de Paula, Raya Horesh, Yixin Chen, Heloisa Candello, Rebecka Nordenlöw, Aminat Adebiyi

Published: 24 Sept 2025, Last Modified: 20 Oct 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language models, evaluation, alignment, safety standards, robustness, human value

Abstract: As Large Language Models (LLMs) become increasingly integrated into real-world applications, ensuring their outputs align with human values, organizational norms, and safety standards has become a central pursuit in machine learning. The field has developed diverse alignment approaches including traditional fine-tuning methods (e.g., RLHF, instruction tuning), post-hoc correction systems, and inference-time interventions, each with distinct advantages and limitations. However, the lack of unified evaluation frameworks makes it difficult to systematically compare these techniques to guide implementation and deployment decisions. This paper introduces MEAL: A Multi-dimensional Evaluation of ALignment Techniques for LLMs, a comprehensive evaluation framework that provides a systematic comparison across major alignment techniques. This framework assesses methods along four key dimensions: alignment detection, alignment quality, computational efficiency, and robustness. To demonstrate the utility of this framework, we run a series of experiments across diverse base models and alignment techniques. This paper describes these experiments and their results and concludes by identifying the strengths and limitations of current state-of-the-art models and providing valuable insights as to the trade-offs among these alignment techniques.

Submission Number: 114

Loading