Abstract: The impressive performance of deep convolutional neural
networks in single-view 3D reconstruction suggests that these models
perform non-trivial reasoning about the 3D structure of the output space.
However, recent work has challenged this belief, showing that complex
encoder-decoder architectures perform similarly to nearest-neighbor base
lines or simple linear decoder models that exploit large amounts of per
category data in standard benchmarks. A more realistic setting, however,
involves inferring the 3D shape of objects with few available examples;
this requires a model that can successfully generalize to novel object
classes. In this work we demonstrate experimentally that naive baselines
fail in this few-shot learning setting, where the network must learn in
formative shape priors for inference of new categories. We propose three
ways to learn a class-speci c global shape prior, directly from data. Using
these techniques, our learned prior is able to capture multi-scale informa
tion about the 3D shape, account for intra-class variability by virtue of an
implicit compositional structure. Experiments on the popular ShapeNet
dataset show that our method outperforms a zero-shot baseline by over
50% and the current state-of-the-art by over 10% in terms of relative
performance, in the few-shot setting.
Loading