Abstract: Implicit Neural Representations (INRs) represent data as continuous functions using the parameters of a neural network, where data information is encoded in the parameter space. Therefore, modeling the distribution of such parameters is crucial for building generalizable INRs. Existing approaches learn a joint distribution of these parameters via a latent vector to generate new data, but such a flat latent often fails to capture the inherent hierarchical structure of the parameter space, leading to entangled data semantics and limited control over the generation process. Here, we propose a **C**ontrollable **H**ierarchical **I**mplicit **N**eural **R**epresentation (**CHINR**) framework, which explicitly models conditional dependencies across layers in the parameter space. Our method consists of two stages: In Stage-1, we construct a Layers-of-Experts (LoE) network, where each layer modulates distinct semantics through a unique latent vector, enabling disentangled and expressive representations. In Stage-2, we introduce a Hierarchical Conditional Diffusion Model (HCDM) to capture conditional dependencies across layers, allowing for controllable and hierarchical data generation at various semantic granularities. Extensive experiments across different modalities demonstrate that CHINR improves generalizability and offers flexible hierarchical control over the generated content.
Lay Summary: We proposed a new AI algorithm that learns to generate images, shapes, or motions in a more controllable way. Our method builds on a class of AI models called Implicit Neural Representations (INRs), which represent data not as pixels or points, but as continuous functions with the parameters of a neural network. The key idea is that these parameters naturally follow a layered structure, where earlier layers capture broad features like shape or pose, and later layers refine the details such as texture or expression.
To take advantage of this structure, we designed a two-step method: first, we learn a model to represent data with layer-wise control codes; then, we train another model to understand how these layer-wise codes depend on each other. This lets us generate new content by adjusting one layer at a time, giving us fine-grained control over what gets created, for example, changing facial expressions without changing the face's overall shape. Our system works across different types of data, making it easier to steer AI generation in meaningful ways.
Primary Area: Deep Learning->Other Representation Learning
Keywords: implicit neural representations, diffusion model
Submission Number: 8615
Loading