Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI

Hadassah Harland; Richard Dazeley; Peter Vamplew; Hashini Senaratne; Bahareh Nakisa; Francisco Cruz

Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI

Hadassah Harland, Richard Dazeley, Peter Vamplew, Hashini Senaratne, Bahareh Nakisa, Francisco Cruz

Published: 10 Oct 2024, Last Modified: 15 Nov 2024Pluralistic-Alignment 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: alignment, pluralistic alignment, multi-objective reinforcement learning

TL;DR: This paper describes an approach for adapting to user preferences through Multi-Objective Reinforcement Learning (MORL)

Abstract: Emerging research in Pluralistic AI alignment seeks to address how to design and deploy intelligent systems in accordance with diverse human needs and values.We contribute a potential approach for aligning AI with diverse and shifting user preferences through Multi-Objective Reinforcement Learning (MORL), via post-learning policy selection adjustment. This paper introduces the proposed framework, outlines its anticipated advantages and assumptions, and discusses technical details for implementation. We also examine the broader implications of adopting a retroactive alignment approach from a sociotechnical systems perspective.

Submission Number: 18

Loading