Spatially-aware Weights Tokenization for NeRF-Language Models

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0
Keywords: MLLM, LLM, NeRF
Abstract: Neural Radiance Fields (NeRFs) are neural networks -- typically multilayer perceptrons (MLPs) -- that represent the geometry and appearance of objects, with applications in vision, graphics, and robotics. Recent works propose understanding NeRFs with natural language using Multimodal Large Language Models (MLLMs) that directly process the weights of a NeRF's MLP. However, these approaches rely on a global representation of the input object, making them unsuitable for spatial reasoning and fine-grained understanding. In contrast, we propose **weights2space**, a self-supervised framework featuring a novel meta-encoder that can compute a sequence of spatial tokens directly from the weights of a NeRF. Leveraging this representation, we build **Spatial LLaNA**, a novel MLLM for NeRFs, capable of understanding details and spatial relationships in objects represented as NeRFs. We evaluate Spatial LLaNA on NeRF captioning and NeRF Q&A tasks, using both existing benchmarks and our novel **Spatial ObjaNeRF** dataset consisting of $100$ manually-curated language annotations for NeRFs. This dataset features 3D models and descriptions that challenge the spatial reasoning capability of MLLMs. Spatial LLaNA outperforms existing approaches across all tasks.
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 15479
Loading