Abstract: Advances in computer vision are pushing the limits of image manipulation, with generative models sampling highly-realistic detailed images on various tasks. However, a specialized model is often developed and trained for each specific task, even though many image edition tasks share similarities. In denoising, inpainting, or image compositing, one always aims at generating a realistic image from a low-quality one. In this paper, we aim at making a step towards a unified approach for image editing. To do so, we propose EdiBERT, a bidirectional transformer that re-samples image patches conditionally to a given image. Using one generic objective, we show that the model resulting from a single training matches state-of-the-art GANs inversion on several tasks: image denoising, image completion, and image composition. We also provide several insights on the latent space of vector-quantized auto-encoders, such as locality and reconstruction capacities. The code is available at https://github.com/EdiBERT4ImageManipulation/EdiBERT.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: #1st revision
According to the reviewers' requests, we made the following changes:
- We clarified the introduction, and briefly explained the difference between the denoising and inpainting tasks.
- After recalling the formula of the attention mechanism, we detailed the training objective of VQGAN in the related work and how our work was built on top of it.
- We added Figure 2 to explain and better visualize the training of EdiBERT and, more specifically, our 2D selection strategy.
- We changed Figure 4 to compare EdiBERT with GANs inversion methods on the task of reconstructing target images and added a quantitative comparison based on LPIPS.
- We added Figure 7 for a better comparison of EdiBERT on the task of inpainting.
- Finally, we clarified some formulas and extended many figure captions.
#2nd revision
- Updated related work with more diffusion-related papers.
- Added a new figure in Appendix displaying qualitative results on the task of image compositing.
Code: https://github.com/EdiBERT4ImageManipulation/EdiBERT
Assigned Action Editor: ~Jia-Bin_Huang1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 299
Loading