TL;DR: A method that can fit small, accurate primitive representations to images of scenes that includes a set-difference operator.
Abstract: Describing a scene in terms of primitives -- geometrically simple shapes that offer a parsimonious but accurate abstraction of structure -- is an established and difficult fitting problem. Different scenes require different numbers of primitives, and these primitives interact strongly. Existing methods are evaluated by predicting depth, normals and segmentation from the primitives, then evaluating the accuracy of those predictions. The state of the art method involves a learned regression procedure to predict a start point consisting of a fixed number of primitives, followed by a descent method to refine the geometry and remove redundant primitives. CSG representations are significantly enhanced by a set-differencing operation. Our representation incorporates $\textit{negative}$ primitives, which are differenced from the positive primitives. These notably enrich the geometry that the model can encode, while complicating the fitting problem. This paper demonstrates a method that can (a) incorporate these negative primitives and (b) choose the overall number of positive and negative primitives by ensembling. Extensive experiments on the standard NYUv2 dataset confirm that (a) this approach results in substantial improvements in depth representation and segmentation over SOTA and (b) negative primitives make a notable contribution to accuracy. Our method is robustly applicable across datasets: in a first, we evaluate primitive prediction for LAION images. Code will be released upon acceptance of the paper.
Primary Area: Applications->Computer Vision
Keywords: Convex Decomposition, 3D primitives, ensembling
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Flagged For Ethics Review: true
Submission Number: 7118
Loading