Abstract: Generative modeling in machine learning aims to synthesize new data samples that are statistically similar to those observed during training. While conventional generative models such as GANs and diffusion models typically assume access to large and diverse datasets, many real-world applications (e.g., in medicine, satellite imaging, and artistic domains) operate under limited data availability and strict constraints. In this survey, we examine Generative Modeling under Data Constraint (GM-DC), which includes limited-data, few-shot, and zero-shot settings. We present a unified perspective on the key challenges in GM-DC, including overfitting, frequency bias, and incompatible knowledge transfer, and discuss how these issues impact model performance.
To systematically analyze this growing field, we introduce two novel taxonomies: one categorizing GM-DC tasks (e.g., unconditional vs. conditional generation, cross-domain adaptation, and subject-driven modeling), and another organizing methodological approaches (e.g., transfer learning, data augmentation, meta-learning, and frequency-aware modeling).
Our study reviews over 230 papers, offering a comprehensive view across generative model types and constraint scenarios. We further analyze task-approach-method interactions using a Sankey diagram and highlight promising directions for future work, including adaptation of foundation models, holistic evaluation frameworks, and data-centric strategies for sample selection.
This survey provides a timely and practical roadmap for researchers and practitioners aiming to advance generative modeling under limited data. Project website: https://anonymous4mysubmission.github.io/gmdc-survey/.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=47zW24uukd&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: **Editor’s comment on previous submission:**
> The submission provides a link to supposedly anonymized code, but the authors’ names can be found on the GitHub README. I ask the authors to address this before resubmitting (please make sure that the author names are not only not displayed anymore, but that they are not viewable through the commit history when you resubmit).
$ $
**Response and changes made:**
Following the editor’s comment, we carefully reviewed the provided repository and removed all information that could potentially reveal author identities, including traces from the commit history and README. The code repository is now fully anonymized in accordance with TMLR's submission guidelines.
Assigned Action Editor: ~Gabriel_Loaiza-Ganem1
Submission Number: 5456
Loading