Large Scale Metric learning

Zay Maung Maung Aye, Kotagiri Ramamohanarao, Benjamin I. P. Rubinstein

2016 (modified: 08 Nov 2022)IJCNN 2016Readers: Everyone

Abstract: Many machine learning and pattern recognition algorithms rely heavily on good distance metrics to achieve competitive performance. While distance metrics can be learned, the computational expense of doing so is currently infeasible on large datasets. In this paper, we propose two efficient-and-effective approaches for selecting the training dataset using Locality-Sensitive Hashing (LSH) with discriminative information, and with K-Means clustering inside LSH buckets, for accelerating metric learning. Our methods yield a speedup factor of (N/C) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> , where N is training set size and C ≪ N is the user-selected compressed set size, achieving quadratic speedup to metric learning often realized as a 1–2 or more orders of magnitude improvement with little degradation to accuracy. For example, our generic filter approach enables the current overall fastest Large Margin Nearest Neighbor (LMNN) to learn metrics on one million samples in 6.8 minutes down from 5.4hrs—a 48x speedup. LMNN and similar state-of-the-art methods use tree data structures to speed up nearest-neighbor queries—an advantage that degrades at higher dimensions. Our approach does not share this limitation.

0 Replies