CoTextor: Training-Free Modular Multilingual Text Editing via Layered Disentanglement and Depth-Aware Fusion

Published: 27 Sept 2025, Last Modified: 09 Nov 2025NeurIPS Creative AI Track 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: Paper
Keywords: Multilingual Text Editing, Human-AI Co-Creation, Layered Disentanglement, Training-Free
TL;DR: CoTextor
Abstract: We introduce \textit{CoTextor}, a modular and training-free framework for multilingual text editing in images, designed to support human-AI co-creation through a user-controllable and reversible workflow. Unlike diffusion-based systems that operate as black boxes, \textit{CoTextor} separates the editing process into transparent layers—foreground extraction, background inpainting, semantic rewriting, and depth-aware reintegration—allowing precise user-guided operations such as rotation, translation, scaling, and warping. To ensure realism, we introduce a perceptually guided integration module that enhances photometric and geometric coherence during text reinsertion. Built entirely from publicly available pretrained components, \textit{CoTextor} is accessible to non-technical, multilingual users, requiring no retraining or annotation. Through real-world scenarios in poster localization, street art remixing, and educational content creation, we demonstrate how \textit{CoTextor} enables inclusive and expressive visual storytelling across cultural and linguistic contexts.
Submission Number: 14
Loading