Building a Special Representation for the Chinese Ancient Buildings in Diffusion models.

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Fine-tuning, Chinese Ancient Buildings, Pinyin, Diffusion Model
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Benefit from the great generative ability of diffusion models, people can build various images based on their imaginations via some carefully designing prompts. Acctually, the functional blocks, like CLIP, for the alignment between prompts and representation of images plays the key role. Limited by the training data, these models performs worse in some rare areas, like Chinese ancient buildings. The reason comes from the missing of special representation of these building's elements, such as breckets, roofs, bias of different periods. In this paper, we firstly collect more than 400 images of ancient buildings. Several subsets are separated by their generalities. Secondly, pinyin, the basic tool for learning Chinese, is firstly introduced into large models as the specific tools to describe the characters of these buildings. Thirdly, we train several fine-tuning models to exhibit the ideal performance of our models compared with existing models. Experiments prove that our route can resolve the barriers between English-centric models and other cultures.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2191
Loading