Keywords: goal-directedness, intentionality, causal modeling, mechanistic interpretability
TL;DR: This paper argues that goal-directedness cannot be measured objectively, and outlines new directions for modeling goal-directed behavior without explicit goal representation, and instead emerging from dynamic interaction.
Abstract: Our ability to predict the behavior of complex agents turns on the attribution of goals. Probing for goal-directed behavior comes in two flavors: Behavioral and mechanistic. The former proposes that goal-directedness can be estimated through behavioral observation, whereas the latter attempts to probe for goals in internal model states. We work through the assumptions behind both approaches, identifying technical and conceptual problems that arise from formalizing goals in agent systems. We arrive at the perhaps surprising position that goal-directedness cannot be measured objectively. We outline new directions for modeling goal-directedness as an emergent property of dynamic, multi-agent systems.
Submission Number: 192
Loading