Tokenize Image as a Set

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Image Generation; Image Tokenizer; Set Modeling
TL;DR: A novel framework for set-based image tokenization and fixed-sum discrete generative modeling.
Abstract: This paper proposes a new paradigm for image generation through set-based tokenization and modeling. Unlike conventional methods that serialize images into fixed-position latent codes with a uniform compression ratio, we introduce an unordered token set representation to dynamically allocate coding capacity based on regional semantic complexity. This TokenSet enhances global context aggregation and improves robustness against local perturbations. To address the critical challenge of modeling discrete sets, we devise a dual transformation mechanism that bijectively converts sets into fixed-length integer sequences while preserving summation constraints. Further, we propose Fixed-Sum Discrete Diffusion—the first framework to simultaneously handle discrete values, fixed sequence length, and summation invariance—enabling effective set distribution modeling. Experiments demonstrate our method's superiority in semantic-aware representation and generation quality. Our innovations, spanning novel representation and modeling strategies, advance visual generation beyond traditional sequential token paradigms.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 11186
Loading