How do humans and machine learning models track multiple objects through occlusion?Download PDF

Published: 18 Oct 2022, Last Modified: 05 May 2023SVRHM PosterReaders: Everyone
Keywords: multiple object tracking, human behavior, machine learning models, tasks
TL;DR: We introduce a novel task with which we demonstrate superhuman performance for pre-trained supervised machine learning models when tracking multiple objects without occlusion but subhuman performance when faced with longer stretches of occlusion.
Abstract: Interacting with a complex environment often requires us to track multiple task-relevant objects not all of which are continually visible. The cognitive literature has focused on tracking a subset of visible identical abstract objects (e.g., circles), isolating the tracking component from its context in real-world experience. In the real world, object tracking is harder in that objects may not be continually visible and easier in that objects differ in appearance and so their recognition can rely on both remembered position and current appearance. Here we introduce a generalized task that combines tracking and recognition of valued objects that move in complex trajectories and frequently disappear behind occluders. Humans and models (from the computer-vision literature on object tracking) performed tasks varying widely in terms of the number of objects to be tracked, the number of distractors, the presence of an occluder, and the appearance similarity between targets and distractors. We replicated results from the human literature, including a deterioration of tracking performance with the number and similarity of targets and distractors. In addition, we find that increasing levels of occlusion reduce performance. All models tested here behaved in qualitatively different ways from human observers, showing superhuman performance for large numbers of targets, and subhuman performance under conditions of occlusion. Our framework will enable future studies to connect the human behavioral and engineering literatures, so as to test image-computable multiple-object-tracking models as models of human performance and to investigate how tracking and recognition interact under natural conditions of dynamic motion and occlusion.
4 Replies