Towards Faster and Stronger Deep Earth Mover's Distance for Few-Shot Learning

15 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Few-shot learning; Visual recognition; Earth Mover's Distance; Gaussian distribution
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We illuminate EMD between Gaussians amounts to optimal feature matching, and propose for the first time to learn a parametric EMD metric for Gaussians. Our efforts lead to effective and efficient EMD for few-shot learning.
Abstract: Recent works in few-shot learning (FSL) for visual recognition have indicated that dense features benefit representation learning across novel categories. One of particularly interesting methods is DeepEMD that is formalized as optimal matching of dense features via an effective statistical distance, i.e., Earth Mover's Distance. Despite its competitive performance, DeepEMD is computationally very expensive due to inherent linear programming. Towards addressing this problem, we propose a metric-based Gaussian EMD (GEMD-M) for FSL. We adopt Gaussians for modeling distributions and closed form EMD between Gaussians as a dis-similarity measure. We illuminate that this metric amounts to feature matching, in which the optimal matching flows follow a joint Gaussian and can be expressed analytically. As the distance in GEMD-M is entangled and not that GPU-friendly, we further present a transfer learning-based Gaussian EMD (GEMD-T). The key idea is to learn a parametric EMD for a more discriminative metric based on square-roots of covariance matrices (via learnable orthogonal matrices) and mean vectors. The learnable metric in GEMD-T is decoupled and thus can be implemented by a fully-connected layer followed by a softmax classifier, very suitable for GPU. We conduct extensive experiments on large-scale Meta-Dataset and three small-scale benchmarks. The results show our GEMD is superior to DeepEMD and achieves compelling performance compared to state-of-the-art methods.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 188
Loading