Abstract: Taking a good photograph can be a time-consuming process, and it usually takes several attempts to capture a moment correctly. This difficulty stems from the many factors that make up a photo, such as framing, perspective, exposure, focus, or subject pose. Getting even one of these factors wrong can spoil a picture, even if the rest are perfect. To make matters worse, many of these factors are often out of our control; for example, a wind gust may displace the subject's hair, or a bird may fly by and occlude the shot. What if we could go back and fix some of these aspects? In my thesis, I explore techniques for "scene rerendering" which enable rich modification of media after capture. First, I propose Nerfies, the first method capable of photo-realistically reconstructing a non-rigidly deforming scene using photos and videos captured casually from mobiles phones. Nerfies augments neural radiance fields (NeRF) by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. I show that these NeRF-like deformation fields are prone to local minima and propose a coarse-to-fine optimization method that allows for more robust optimization. By adopting principles from geometry processing and physical simulation to NeRF-like models, I also propose an elastic regularization of the deformation field that further improves robustness. I demonstrate how Nerfies can turn casually captured selfie photos/videos into deformable NeRF models that allow for photorealistic renderings of the subject from arbitrary viewpoints. Deformation-based approaches such as Nerfies struggle to model changes in topology (e.g., slicing a lemon), as topological changes require a discontinuity in the deformation field, but these deformation fields are necessarily continuous. I address this limitation in HyperNeRF, in which I propose lifting NeRFs into a higher-dimensional space and by representing the 5D radiance field corresponding to each input image as a slice through this "hyperspace." This approach is inspired by level set methods, which model the evolution of surfaces as slices through a higher dimensional surface. Next, I present PhotoShape, an approach that creates photorealistic, relightable 3D models automatically. PhotoShape automatically assigns high-quality, realistic appearance models to large-scale 3D shape collections. By generating many synthetic renderings, I train a convolutional neural network to classify materials in real photos and employ 3D-2D alignment techniques to transfer materials to different parts of each shape model. The key idea is to jointly leverage three types of online data – shape collections, material collections, and photo collections, using the photos as reference to guide the assignment of materials to shapes. Finally, I show how we can exploit methods for scene rerendering to solve \emph{inverse} problems.I propose LatentFusion, a framework for performing 3D reconstruction and rendering using a neural network. This neural network takes posed images of an object as input and can render it from any novel viewpoint. I show how LatentFusion can be used for 6D object pose estimation by optimizing the input pose as a free parameter using gradient descent. Also, since this method incorporates objects at inference time, it can perform pose estimation on unseen objects without additional training--- an immense benefit over existing methods which require training a separate network for every new object.
0 Replies
Loading