Diffusion Federated Dataset

Seok-Ju Hahn; Junghye Lee

Diffusion Federated Dataset

Seok-Ju Hahn, Junghye Lee

Published: 18 Sept 2025, Last Modified: 05 Feb 2026NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-SA 4.0

Keywords: federated learning, diffusion model, collaborative machine learning, synthetic data, cooperative inference

TL;DR: A cooperative inference framework of pre-trained diffusion models for collaborative synthetic dataset generation

Abstract: Diffusion models have demonstrated decent generation quality, yet their deployment in federated learning scenarios remains challenging. Due to data heterogeneity and a large number of parameters, conventional parameter averaging schemes often fail to achieve stable collaborative training of diffusion models. We reframe collaborative synthetic data generation as a cooperative sampling procedure from a mixture of decentralized distributions, each encoded by a pre-trained local diffusion model. This leverages the connection between diffusion and energy-based models, which readily supports compositional generation thereof. Consequently, we can directly obtain refined synthetic dataset, optionally with differential privacy guarantee, even without exchanging diffusion model parameters. Our framework reduces communication overhead while maintaining the generation quality, realized through an unadjusted Langevin algorithm with a convergence guarantee.

Primary Area: Infrastructure (e.g., libraries, improved implementation and scalability, distributed solutions)

Submission Number: 25315

Loading