Abstract: Recently, the large-scale language-image pre-trained model, such as CLIP, has drawn much attention due to its remarkable ability for various tasks, including classification and image synthesis. The combination of CLIP and GAN can be used for text-based image manipulation and text-based image synthesis.Several models of a combination of CLIP and GAN have been proposed so far. However, their effectiveness in the food image domain has not been examined comprehensively yet. In this paper, we reported the results of the experiments on text-based food image manipulation using VQGAN-CLIP and discussed the possibility of food image manipulation by texts.
0 Replies
Loading