Orthogonal Drift Correction (ODC): Improving Semantic Alignment via Training-Free Embedding Refinement

ICLR 2026 Conference Submission18048 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: inference-time guidance, prompt-image alignment, training-free, diffusion models, text-to-image generation
Abstract: Text-to-image models have achieved remarkable success in generating high-quality images from textual descriptions. However, they often struggle with "semantic drift," where the generated output fails to align perfectly with complex or nuanced text prompts. In this paper, we introduce Orthogonal Drift Correction (ODC), a novel, inference-time guidance technique designed to mitigate semantic drift without requiring any model retraining. ODC guides the image generation process through a two-stage mechanism. It first generates an initial image, then uses a pre-trained vision-language model to compute a semantic error vector between this image and the prompt. Next, we isolate the component of this error vector that is orthogonal to the prompt's direction, hypothesizing that this component represents the most detrimental, off-topic drift. By subtracting this orthogonal error vector, we create a refined conditioning vector for a second, corrected generation pass. Our experiments demonstrate that ODC significantly enhances prompt-image alignment, leading to images that more accurately reflect detailed compositional and attribute-based instructions. As a plug-and-play module, ODC offers a practical and computationally efficient method for improving the reliability of state-of-the-art text-to-image models.
Primary Area: generative models
Submission Number: 18048
Loading