Abstract: For artificial intelligence (AI), understanding and interacting with the physical world requires forming internal representations of its 3D geometry, often through visual perception. Hence, in this thesis, we investigate the representation learning of 3D shape via computer vision (CV). In particular, we explore the Analysis-by-Synthesis (AbS) paradigm, which postulates perceptual inference as the inversion of a generative process. This provides several advantages: weak supervision, via the reconstructive signal, and the opportunity for disentanglement, via regularizing priors. The resulting representations are more versatile, controllable, and widely applicable. We begin by defining a prior on the deformation space of shapes, separating intrinsic versus isometric changes through an AbS-trained generative model. Crucially, our prior relies purely on information-theoretic disentanglement of differential geometric quantities. Thus, unlike existing works, our approach, the Geometrically Disentangled Variational Autoencoder (GDVAE), is fully unsupervised. The resulting representation can then be applied to pose-aware shape transfer and retrieval, in addition to spectral geometry processing. Next, we consider the classical inverse graphics interpretation of CV, learning a weakly supervised modality translator between 2D images and disentangled 3D graphics codes. The mappings, trained via AbS-based cycle-consistency constraints and distribution-matching priors, implement single-image 3D reconstruction (SI3DR) in one direction and 3D-aware generative image modelling in the other. Unlike prior methods, our Cyclic Generative Renderer (CGR) only requires unpaired images and shapes, greatly expanding its applicability and scalability. Finally, motivated by the primacy of differentiable rendering in modern AbS-based CV, we construct a novel representation of 3D shape, the Probabilistic Directed Distance Field (PDDF), capable of single-query geometric neural rendering. We provide a theoretical investigation of the PDDF, particularly its geometric properties and view consistency. Then, we apply PDDFs to a variety of problems, including shape fitting, SI3DR, 3D-aware generative modelling, and light transport. Altogether, our focus on 3D geometry, encompassing its deformation structure, rendering into 2D, and inference from images, continues a long line of effort in CV. Emphasizing disentanglement and weak supervision encourages useful latent decompositions and improves scalability. Our hope is to advance the frontier of shape representation learning, towards intelligent agents that better understand the geometric structure of the world.
External IDs:dblp:phd/ca/AumentadoArmstrong24
Loading