Abstract: Generative model often struggle to produce data beyond the training distribution. To address this, we explored various methods for unseen data generation, eventually focusing on compositional zero-shot image generation. By guiding the generation model with compositional class labels, we achieved better control over the generation process. Our model can generate unseen images whose compositional labels are not appear in the training set. While large language models like GPT-4 and image generation models like DALL-E offer similar zero-shot generation capabilities, our research emphasizes domain-specific zero-shot generation using smaller models. Through a series of tasks, we demonstrated the effectiveness of compositional zero-shot image generation across various complexities, showcasing its potential in contemporary machine learning.
Loading