Abstract: Textual Inversion (TI) is efficient for text-to-image personalization but often fails on complex prompts. We identify embedding norm inflation as a key cause and show that token semantics are primarily encoded by embedding direction. We propose Directional Textual Inversion (DTI), which fixes the embedding magnitude to an in-distribution scale and optimizes only direction with a simple MAP objective using a von Mises-Fisher prior. DTI improves prompt fidelity over existing embedding optimization baselines while maintaining competitive subject similarity. Furthermore, we demonstrate its ease of integration and creative applications.
Submission Number: 55
Loading