Keywords: 3D vision, Bayesian models, Supervised learning, Motion, Canonical computations
TL;DR: We train Bayesian observer models to perform 3D motion estimation tasks, and compare the resulting structure and behavior to canonical circuit models and human psychophysical data.
Abstract: Estimating the motion of objects in depth is important for behavior, and is strongly supported by binocular visual cues. To understand how the brain should estimate motion in depth, we develop image-computable ideal observer models from naturalistic binocular video clips of two 3D motion tasks. The observers spatio-temporally filter the videos and non-linearly decode 3D motion from the filter responses. The optimal filters and decoder are dictated by the task-relevant statistics and are specific to each task. Multiple findings emerge. First, two distinct filter types are spontaneously learned for each task. For 3D speed estimation, filters emerge for processing either changing disparities over time (CDOT) or interocular velocity differences (IOVD), cues used by humans. For 3D direction estimation, filters emerge for discriminating either left-right or towards-away motion. Second, the covariance of the filter responses carries the information about the task-relevant latent variable and the filter responses, conditioned on the latent variable, are well-described as jointly Gaussian. Quadratic combination is thus necessary for optimal decoding. Finally, the ideal observer yields non-obvious, counter-intuitive patterns of performance like those exhibited by humans. Important characteristics of human 3D motion processing and estimation may therefore result from optimal information processing in the early visual system.