Abstract: The ML literature contains many distinct concepts falling under the heading of ‘AI alignment’. After noting three concepts of AI alignment and situating these ideals in the context of their corresponding research programs, we claim that realistic interventions may promote ‘AI alignment’ under one conception while being actively counterproductive from the perspective of others. We suggest that tensions between alignment ideals emerge due to differences in background threat-models, alongside differences in both methodological and normative orientations. In light of our analysis, researchers taking themselves to produce research aimed to further the goal of ‘AI alignment’ should
do three things. First, they should distinguish between ‘AI alignment’ as a high-level ideal and the specific ‘alignment proxies’ used in empirical research. Second, they should use more granular concepts to identify the source in addition to the nature of possible AI harms/benefits. Third, they explicitly specify the non-technical background commitments motivating specific conceptions of ‘AI alignment’.
Lay Summary: The ML literature contains many distinct concepts falling under the heading of 'AI alignment'. After noting three concepts of AI alignment in the context of their corresponding research programs, we claim that realistic interventions may promote 'AI alignment' under one conception while being actively counterproductive from the perspective of others. We suggest that tensions between alignment ideals emerge due to differences in background threat-models, alongside differences in normative orientations. In light of our analysis, researchers aiming to further the goal of 'AI alignment' should do five things. First, they should not conflate distinctions of policy and distinctions of scientific scope; second, methodological disagreements should be acknowledged explicitly; third, researcgers should distinguish between 'AI alignment' as a high-level ideal and specific 'alignment proxies' used in empirical research; fourth, they should use more granular concepts to identify both the source and nature of possible AI harms/benefits; fifth, they should explicitly acknowledge the diversity of `alignment' concepts in both empirical work and in communication with non-technical audiences.
Primary Area: Research Priorities, Methodology, and Evaluation
Keywords: alignment problem, philosophy of science, AI safety, AI ethics
Originally Submitted PDF: pdf
Submission Number: 902
Loading