Inference-Time Control for SVG Generation via Information-Projection Guided Constrained Decoding

Inference-Time Control for SVG Generation via Information-Projection Guided Constrained Decoding

ICLR 2026 Conference Submission15886 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: autoregressive SVG generation, constrained decoding, information projection, inference-time control

TL;DR: We cast control of autoregressive SVG as information projection, and realize it at decoding with STaMP: a tilt steering controls and a mask enforcing syntax. On text-/image-to-SVG, STaMP yields valid, constraint-compliant outputs without retraining.

Abstract: Recent autoregressive models can generate SVG from text or images, but they fail to reliably follow user-specified constraints such as colors, layouts and fonts. This limitation highlights that controllability is the missing primitive in autoregressive vector generation. Prompt tinkering and post-hoc edits are brittle, and many practical systems either require retraining for each new constraint or fall back to raster outputs that must be vectorized, underscoring the absence of any autoregressive vector generation method that enables control at inference time. We hypothesize that precise, constraint-driven vector generation is fundamentally a decoding-time constraint-satisfaction problem. Formally, we cast this objective as finding the optimal controlled distribution: among all distributions that satisfy the constraints, select the one closest (in KL) to the base model. We show this distribution is the information projection (I-projection) of the base model onto the constrained set. Direct sampling from the I-projection is intractable, but its structure suggests a practical decomposition: a soft reweighting that steers probabilities toward the desired properties and a hard restriction that removes invalid continuations. Building on this insight, we introduce STaMP (Soft Tilt-and-Mask Policy), a model-agnostic, inference-time controller that adds fine-grained control (e.g., color, font, and layout) to any autoregressive SVG model. Evaluated across text-to-SVG and image-to-SVG settings on multiple open models, STaMP delivers inference-time control, consistently improves constraint adherence, and preserves the base model's output quality. Additionally, we introduce, to the best of our knowledge, the first text-to-design SVG model as an extended showcase: paired with STaMP, it produces full compositions as structured, editable SVG while honoring user-defined controls over color, typography, layout, and asset placement, all within a single inference pass.

Primary Area: generative models

Submission Number: 15886

Loading