Unsupervised Discovery and Composition of Object Light Fields

TMLR Paper382 Authors

24 Aug 2022 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Neural scene representations, both continuous and discrete, have recently emerged as a powerful new paradigm for 3D scene understanding. Recent efforts have tackled unsupervised discovery of object-centric neural scene representations. However, the high cost of ray-marching, exacerbated by the fact that each object representation has to be ray-marched separately, leads to insufficiently sampled radiance fields and thus, noisy renderings, poor framerates, and high memory and time complexity during training and rendering. Here, we propose to represent objects in an object-centric, compositional scene representation as light fields. We propose a novel light field compositor module that enables reconstructing the global light field from a set of object-centric light fields. Dubbed Compositional Object Light Fields (COLF), our method enables unsupervised learning of object-centric neural scene representations, state-of-the-art reconstruction and novel view synthesis performance on standard datasets, and rendering and training speeds at orders of magnitude faster than existing 3D approaches.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Relationship to layered-depth images (2gPY) > Added sub-section in related works section introducing the relationship (page 3 in sub-section "Layered Representations for View Synthesis"). Discussion of more recent fast NeRF training as potential solution to slow object-NeRFs (z9ns) > Expanded on why these methods are not viable solutions for our problem, in the related works section (page 3 in section "3D Compositional Scene Representations" from "While recent work" to "yet been demonstrated") More thorough discussion of design choices for the light field compositor (z9ns) > Elaborated on the theoretical motivation behind the compositor and the current parameterization over other solutions we considered (page 5, paragraph just below equation 2). Consider moving experiments 5.5 and 5.4 from appendix to main paper (z9ns) > Merged 5.5 from appendix to main paper in Figure 4 (page 7) and a reference in page 5 (just above section 3.3). We decided to keep 5.4 in the supplement as we felt it was not significant enough of an experiment to merit space in the main paper. Shadow-based segmentation results concerning and confusing (2gPY, 2uG2) > Elaborated on the shadow-based model segmentations and their correctness in comparison to standard annotated segmentation benchmarks (page 10, section "results" of 4.2, from "Although some" to "removed as well"). Hints on applications to real world scenes (2gPY) > We expanded on potential applications and on the concurrent work on improving the robustness of object-encoders to real world scenes, which we believe will benefit from our lightweight 3D representation when applied to large-scale real-world datasets (page 12, section "Discussion", from "These concurrent works" to "can be large"). Discrepancy in FG-ARI values of uORF in Table 2 (2uG2) > We fixed these results in Table 2 and thank 2uG2 for catching our error.
Assigned Action Editor: ~Antoni_B._Chan1
Submission Number: 382
Loading