GarmentGPT: Compositional Garment Pattern Generation via Discrete Latent Tokenization

Published: 26 Jan 2026, Last Modified: 17 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Garment Generation, vision language models
Abstract: Apparel is a fundamental component of human appearance, making garment digitalization critical for digital human creation. However, sewing pattern creation traditionally relies on the intuition and extensive experience of skilled artisans. This manual bottleneck significantly hinders the scalability of digital garment creation. Existing generative approaches either operate as data replicators without intrinsic understanding of garment construction principles (e.g., diffusion models), or struggle with low-level regression of raw floating-point coordinates (e.g., Vision-Language Models). We present GarmentGPT, the first framework to operationalize latent space generation for sewing patterns. Our approach introduces a novel pipeline where a RVQ-VAE tokenizes continuous pattern boundary curves into discrete codebook indices. A fine-tuned Vision-Language Model then autoregressively predicts these discrete token sequences instead of regressing coordinates, enabling high-level compositional reasoning. This paradigm shift aligns generation with the knowledge-driven, symbolic reasoning capabilities of large language models. To address the data bottleneck for real-world applications, we develop a Data Curation Pipeline that synthesizes over one million photorealistic images paired with GarmentCode, and establish the Real-Garments Benchmark for comprehensive evaluation. Experiments demonstrate that GarmentGPT significantly outperforms existing methods on structured datasets (95.62\% Panel Accuracy, 81.84\% Stitch Accuracy), validating our discrete compositional paradigm's advantages. Code is available at \url{https://github.com/ChimerAI-MMLab/Garment-GPT}.
Primary Area: generative models
Submission Number: 1396
Loading