Abstract: With the growing popularity of large language models (LLMs) and the few-shot learners, deep neural network based metric space learning has attracted considerable attention. Metric space learning aims at learning feature embeddings through training embedding models, followed by computing the metric space similarity of feature vectors in the learned embedding space for different types of downstream learning tasks, such as multi-modal embedding learning for cross-modal retrieval. However, existing methods on cosine-based metric space similarity suffer from several limitations. First, when applying the cosine similarity to high dimensional vectors, the shapes of the two vectors must be identical, Second, the metric space similarity on the concatenate layer may suffer from misalignment of the vector elements because the concatenate layer may not reflect the positions of the elements in the two vectors, leading to deteriorated performance with respect to the learning success rates. This paper will review the limitations of the metric space similarity methods, such as cosine similarity or concatenate layer, and provide our insight on some solution approaches we have developed to address these limitations.
Loading