Abstract: We present NeSF, a method for producing 3D semantic fields from posed RGB images alone. In place of classical 3D representations, our method builds on recent work in neural fields wherein 3D structure is captured by point-wise functions. We leverage this methodology to recover 3D density fields upon which we then train a 3D semantic segmentation model supervised by posed 2D semantic maps. Despite being trained on 2D signals alone, our method is able to generate 3D-consistent semantic maps from novel camera poses and can be queried at arbitrary 3D points. Notably, NeSF is compatible with any method producing a density field. Our empirical analysis demonstrates comparable quality to competitive 2D and 3D semantic segmentation baselines on complex, realistically-rendered scenes and significantly outperforms a comparable neural radiance field-based method on a series of tasks requiring 3D reasoning. Our method is the first to learn semantics by recognizing patterns in the geometry stored within a 3D neural field representation. NeSF is trained using purely 2D signals and requires as few as one labeled image per-scene at train time. No semantic input is required for inference on novel scenes.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Thank you to the Action Editor for the consolidated list of requested modifications. We have addressed all requested modifications, and included a few additional ones at our discretion as inspired by discussion with the reviewers. **Requested Edits** 1. (kHpB, KCQV) Please include in the supplementary the SparseConvNet using noisy Nerf Geometry baseline discussed in kHpB Q2 (proposal #1 - the one that has already been completed; no need to do any new ones). This was also the one agreed upon with KCQV in Q3 . >> Experiment results and description included in **Supplement B.2 Ablations**, see **Comparison to NeRF SparseConvNet** on page 28. 2. (KQCV) Please address KQCV's Q2 request for a simple ablation (i.e., one that takes the second-to-last feature layer). >> Experiment results and description included in **Supplement B.2 Ablations**, see **Addition of RGB features** on page 29. 3. (kHpB) Please add to Figure 4's caption to help readers understand it better. An abbreviated version of what was written in the response to kHpB on 17 May is perfect. >> Updated **Figure 4** caption with additional last sentence addressing non-monotonicity, see top of page 13. Additionally extended section **Sensitivity to data scarcity** on page 12 from "We note, however .." through "performance on novel scenes." 4. (u1Zd) Please add a few short sentences to address the discussion regarding u1Zd Q1 ("nerf took us away from the clutches of synthetic data"). The editor agrees with u1Zd that a simple demo could be nice, but also not necessary >> Edited last paragraph of **Introduction**. Described limitations of current large scale datasets & noted concurrent decision to consider additional method development to overcome the stated issues, as beyond the scope of this work. See page 3 starting from "As large scale datasets .." to " .. beyond the scope of this work". 5. (u1Zd) Please add commentary about data augmentation (Q4) and new semantic classes (Q6) >> Edited Section **3 Method**, sub section **data augmentation**, to include additional description around data augmentation method. See page 7 starting from "Subsequently, the spatial reasoning .." through "remainder of algorithm remains as before." >> Edited Section **5.3 Ablation Studies**, end of last paragraph, to describe expectation around dataset shift in the case of unseen objects. See page 14 & 16 from "In this work we have explored .." through "Semantic-NeRF and NeSF as well." **Additional Edits** 6. KQCV concern around claim "our method is the first to learn semantics from the geometry .."; the authors clarified the claim in discussion noting first to obtain "generalization by identifying patterns in **geometric space**" >> Edited **Abstract**, last 3 sentences, to integrate clarified claim on novelty. See page 1 from "Our method is .." through "inference on novel scenes." 7. kHpB request to clarify what hyperparameters in Table 5 correspond to >> Extended **Table 5** Figure description, to include details around explicit meaning of hyper-parameters. See page 12, top of page, from "Hyperparameters correspond to the following .. " through "per layer of the MLP." 7. Suggestion for consolidated section on failure cases >> Edited last paragraph of **Conclusions and Limitations** to additionally note "reducing floater artifacts" as a part of future work, so as to implicitly review all failures noted throughout the work with a corresponding note for future work. (Inclusion of a separate Failure Cases section felt redundant.)
Assigned Action Editor: ~David_Fouhey2