Abstract: As big models demonstrate remarkable performance across diverse tasks, concerns about their potential risks and social harms are raised. Extensive efforts have been made towards aligning big models with humans to ensure their responsible development and human profits maximization. Nevertheless, the basic question "what to align with" remains largely unexplored. It is critical to precisely define the objectives for big models to pursue, and aligning with inappropriate goals could cause disaster, e.g., chatbots promote abusive or biased contents when only following user instructions to interact freely. This paper conducts a comprehensive survey of different alignment goals, tracing their evolution paths to identify the most appropriate goal for big models. Specifically, we categorize existing alignment goals into four primary levels: human instructions, human preferences, value principles and basic values, revealing a learning process that transforms from basic abilities to higher value concepts. For each goal, we further elaborate its definition, how to represent it and how to evaluate it. Posing basic values as a promising goal, we discuss challenges and future research directions.
Paper Type: long
Research Area: Ethics, Bias, and Fairness
Contribution Types: Surveys
Languages Studied: English, Multilingual
0 Replies
Loading