Vertical Moral Growth: A Novel Developmental Framework for Human Feedback Quality in AI Alignment

Published: 10 Jun 2025, Last Modified: 30 Jun 2025MoFA PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Human Feedback Models, AI Alignment, Feedback Quality, Developmental Psychology, Moral Reasoning, RLHF, Preference Learning, Model-Specific Alignment
TL;DR: We reconceptualize human feedback quality through developmental psychology, introducing VMG—a framework targeting Stage 6 moral reasoning over aggregating all preferences. Initial tests: 80% deception reduction with 50 examples (model-dependent).
Abstract: Current models of human feedback in AI alignment assume that preferences are static, unbiased, and uniformly reliable across annotators---assumptions that fail to account for the developmental nature of moral reasoning. We introduce Vertical Moral Growth (VMG), our novel framework that reconceptualizes feedback quality through Kohlberg's stages of moral development, proposing that targeting Stage 6 universal ethical principles can yield higher-quality alignment than aggregating all feedback equally. As an initial validation, we demonstrate through experiential learning with just 50 expert-validated moral dilemmas that VMG elevated GPT-4o to consistent Stage 6 reasoning and reduced deceptive behaviors by 80\% under adversarial conditions. However, Llama3-70B exhibited catastrophic forgetting despite moral gains, revealing critical model-dependent effects. By reframing human feedback through developmental psychology, VMG offers a complementary theoretical lens to existing methods, transforming the annotation problem from "what do humans prefer?" to "what represents the highest quality of human moral reasoning?"---opening new avenuesfor principled approaches to AI alignment across diverse model architectures.
Submission Number: 17
Loading