S5 Framework: A Review of Self-Supervised Shared Semantic Space Optimization for Multimodal Zero-Shot LearningDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: In this review, we aim to inspire research into Self-Supervised Shared Semantic Space (S5) multimodal learning problems. We equip non-expert researchers with a framework of informed modeling decisions via an extensive literature review, an actionable modeling checklist, as well as a series of novel zero-shot evaluation tasks. The core idea for our S5 checklist lies in learning contextual multimodal interactions at various granularity levels via a shared Transformer encoder with a denoising loss term, which is also regularized by a contrastive loss term to induce a semantic alignment prior on the contextual embedding space. Essentially, we aim to model human concept understanding and thus learn to ``put a name to a face''. This ultimately enables interpretable zero-shot S5 generalization on a variety of novel downstream tasks. In summary, this review provides sufficient background and actionable strategies for training cutting-edge S5 multimodal networks.
Paper Type: long
0 Replies

Loading