Keywords: concepts, alignment, concept understanding, same/different
Abstract: As advanced AI systems such as generative foundation models exhibit an increasingly rich range of behaviors, a challenge for AI alignment and safety research is how to effectively systematize and characterize these behaviors in a way that helps us understand and develop safer models. One key question on the path towards this goal is whether AI systems conceptually understand the world in the same way that humans do. A classic family of tasks used to probe concept understanding in humans and non-human animals is same/different tasks, which test for an understanding of the abstract concepts of "sameness" and "difference" across different stimuli. Taking inspiration from these studies of concept learning in humans and non-human animals, we present experimental results that investigate T2I model understanding of same/different concepts. We show that while T2I models demonstrate some understanding of same/different concepts, this understanding varies significantly across different attributes of sameness and difference (such as texture, color, rotation, and size). We discuss how revealing such behavioral differences can help us design more robust model training and evaluation protocols. Finally, we explain how analogies between behavioral analyses of concept learning in humans, non-human animals, and models can help us better understand the increasingly varied and often unpredictable behaviors that models exhibit.
Submission Number: 89
Loading