StylOS: Multi-View 3D Stylization with Single-Forward Gaussian Splatting

StylOS: Multi-View 3D Stylization with Single-Forward Gaussian Splatting

ICLR 2026 Conference Submission15555 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 3D Style Transfer, 3D Gaussian Splatting, Single-forward Stylization, 3D Reconstruction

TL;DR: StylOS is a single-pass framework that performs geometry-aware, view-consistent 3D style transfer from unposed content images using a reference style image, without per-scene optimization or precomputed poses.

Abstract: We present Stylos, a single-forward 3D Gaussian framework for 3D style transfer that operates on unposed content, from a single image to a multi- view collection, conditioned on a separate reference style image. Stylos synthesizes a stylized 3D Gaussian scene without per-scene optimization or precomputed poses, achieving geometry-aware, view-consistent stylization that generalizes to unseen categories, scenes, and styles. At its core, Stylos adopts a Transformer backbone with two pathways: geometry predictions retain self-attention to preserve geometric fidelity, while style is injected via global cross-attention to enforce visual consistency across views. With the addition of a voxel-based 3D style loss that aligns aggregated scene features to style statistics, Stylos enforces view-consistent stylization while preserving geometry. Experiments across multiple datasets demonstrate that Stylos delivers high-quality zero-shot stylization, highlighting the ef- fectiveness of global style–content coupling, the proposed 3D style loss, and the scalability of our framework from single view to large-scale multi-view settings. Our codes will be fully open-sourced soon.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 15555

Loading