MEAL: A Multi-dimensional Evaluation of Alignment Techniques for LLMs

Muneeza Azmat; Momin Abbas; Maysa Macedo; Marcelo Carpinette Grave; Luan Soares de Souza; Tiago Lemos de Araujo Machado; Rogério Abreu de Paula; Raya Horesh; Yixin Chen; Heloisa Candello; Rebecka Nordenlöw; Aminat Adebiyi

MEAL: A Multi-dimensional Evaluation of Alignment Techniques for LLMs

Muneeza Azmat, Momin Abbas, Maysa Macedo, Marcelo Carpinette Grave, Luan Soares de Souza, Tiago Lemos de Araujo Machado, Rogério Abreu de Paula, Raya Horesh, Yixin Chen, Heloisa Candello, Rebecka Nordenlöw, Aminat Adebiyi

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, alignment, human values, safety standards, robustness, model evaluation

TL;DR: This paper introduces MEAL, a framework for systematically evaluating and comparing different alignment techniques for Large Language Models across multiple dimensions like awareness, efficiency, quality, and robustness.

Abstract: As Large Language Models (LLMs) become increasingly integrated into real-world applications, ensuring their outputs align with human values, organizational norms, and safety standards has become a central pursuit in machine learning. The field has developed diverse alignment approaches including traditional fine-tuning methods (e.g., RLHF, instruction tuning), post-hoc correction systems, and inference-time interventions, each with distinct advantages and limitations. However, the lack of unified evaluation frameworks makes it difficult to systematically compare these techniques to guide implementation and deployment decisions. This paper introduces MEAL: A Multi-dimensional Evaluation of ALignment Techniques for LLMs, a comprehensive evaluation framework that provides a systematic comparison across these major alignment techniques. This framework assesses methods along four key dimensions: alignment detection, alignment quality, computational efficiency, and robustness. To demonstrate the utility of this framework, we run a series of experiments across diverse base models and alignment techniques. This paper describes these experiments and their results and concludes by identifying the strengths and limitations of current state-of-the-art models and providing valuable insights as to the trade-offs among these alignment techniques.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 13138

Loading