Abstract: Highlights•Proposed a hierarchical learning method for text-guided image inpainting.•The object-fine-grained learning stage focuses on the visual semantics of objects of interest.•Designed a mask reconstruction module focusing on the object of interest.•Explored a multi-attention mechanism to fuse visual and textual semantics.•Devised a flexible discriminator to penalize the corrupted area.
Loading