- TL;DR: We explored how a novel method of compositional set embeddings can both perceive and represent not just a single class but an entire set of classes that is associated with the input data.
- Abstract: We explore the idea of compositional set embeddings that can be used to infer not just a single class, but the set of classes associated with the input data (e.g., image, video, audio signal). This can be useful, for example, in multi-object detection in images, or multi-speaker diarization (one-shot learning) in audio. In particular, we devise and implement two novel models consisting of (1) an embedding function f trained jointly with a “composite” function g that computes set union opera- tions between the classes encoded in two embedding vectors; and (2) embedding f trained jointly with a “query” function h that computes whether the classes en- coded in one embedding subsume the classes encoded in another embedding. In contrast to prior work, these models must both perceive the classes associated with the input examples, and also encode the relationships between different class label sets. In experiments conducted on simulated data, OmniGlot, and COCO datasets, the proposed composite embedding models outperform baselines based on traditional embedding approaches.
- Code: https://drive.google.com/open?id=1zjsK9DP3CUqwcVSNwDPshIxOV5hQwFxt
- Keywords: Embedding, One-shot Learning, Compositional Representation