Hyper-Transforming Latent Diffusion Models

Ignacio Peis; Batuhan Koyuncu; Isabel Valera; Jes Frellsen

Hyper-Transforming Latent Diffusion Models

Ignacio Peis, Batuhan Koyuncu, Isabel Valera, Jes Frellsen

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose a generative framework for INRs that integrates a Transformer-based hypernetwork decoder into latent diffusion models, enabling scalable INR generation and efficient adaptation via hyper-transforming, which fine-tunes only the decoder.

Abstract: We introduce a novel generative framework for functions by integrating Implicit Neural Representations (INRs) and Transformer-based hypernetworks into latent variable models. Unlike prior approaches that rely on MLP-based hypernetworks with scalability limitations, our method employs a Transformer-based decoder to generate INR parameters from latent variables, addressing both representation capacity and computational efficiency. Our framework extends latent diffusion models (LDMs) to INR generation by replacing standard decoders with a Transformer-based hypernetwork, which can be trained either from scratch or via hyper-transforming—a strategy that fine-tunes only the decoder while freezing the pre-trained latent space. This enables efficient adaptation of existing generative models to INR-based representations without requiring full retraining. We validate our approach across multiple modalities, demonstrating improved scalability, expressiveness, and generalization over existing INR-based generative models. Our findings establish a unified and flexible framework for learning structured function representations.

Lay Summary: Modern AI models often treat data like images, 3D shapes, or climate maps as fixed grids of values. This discretized view overlooks the continuous nature of real-world signals, potentially discarding rich structural information. Earlier approaches tried to address this with implicit neural representations (INRs), but often struggled to efficiently model complex data, limiting scalability across tasks and resolutions. We propose a generative model that represents data as continuous functions rather than discrete grids. By combining powerful diffusion models with a lightweight Transformer decoder, it learns flexible internal representations for generating, reconstructing, and completing data across diverse domains. Crucially, it builds on pretrained models and requires training only a small adapter, making it scalable and efficient. This work paves the way for AI systems that are more adaptable and resolution-independent. It has implications for fields like medical imaging, scientific simulations, and graphics, where high-quality, flexible data generation is essential.

Link To Code: https://github.com/ipeis/LDMI

Primary Area: Deep Learning->Generative Models and Autoencoders

Keywords: Latent Diffusion Models, Transformers, INRs

Submission Number: 15913

Loading