Voxify3D: From Mesh to Voxel Art with Palette Discretization and Semantic Guidance

12 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Voxel art, 3D stylization, Neural voxel grid, Differentiable rendering, Lego, Voxel, Art, Design, Game, Fabrication
TL;DR: Voxify3D is a differentiable two-stage pipeline that converts 3D meshes into stylized voxel art via palette-based quantization, preserving semantics and abstraction across varied palettes, color counts, and voxel resolutions.
Abstract: Voxel art is a distinctive stylization widely used in games and digital media, yet creating it from 3D meshes remains labor-intensive. Existing approaches, such as downsampling or direct editing, fail to capture the abstract aesthetics and preserve essential details. We introduce Voxify3D, a differentiable two-stage framework for stylized voxel art generation. First, a coarse voxel grid is initialized via voxel-based 3D reconstruction. Then, the grid is refined under six-view orthographic pixel-art supervision with colors constrained to discrete palettes. Our method incorporates (1) orthographic projection with pixel art supervision to preserve sharp and essential abstract details, (2) a patch-level perceptual loss to preserve distinctive semantic features, and (3) a differentiable palette-based quantization scheme leveraging Gumbel-Softmax, which produces clear voxel renderings with distinct tonal abstraction. Experiments and user studies show that Voxify3D achieves superior visual quality, semantic fidelity, and pixel-level aesthetics compared to prior methods, providing a practical solution for automated voxel art creation.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 4533
Loading