Abstract: We consider the Euclidean bi-chromatic matching problem in the dynamic setting, where the goal is to efficiently process point insertions and deletions while maintaining a high-quality solution. Computing the minimum cost bi-chromatic matching is one of the core problems in geometric optimization that has found many applications, most notably in estimating Wasserstein distance between two distributions. In this work, we present the first fully dynamic algorithm for Euclidean bi-chromatic matching with sublinear update time. For any fixed $\varepsilon > 0$, our algorithm achieves $O(1/\varepsilon)$-approximation and handles updates in $O(n^{\varepsilon})$ time. Our experiments show that our algorithm enables effective monitoring of the distributional drift in the Wasserstein distance on real and synthetic data sets, while outperforming the runtime of baseline approximations by orders of magnitudes.
Lay Summary: Consider the following problem: you are given the same number of red and blue points on a plane, and you need to pair (match) these two sets by drawing a line between each red-blue pair. Your goal is to use as little of your pencil as possible - the total length of all the lines should be as short as possible. Surprisingly, this simple-sounding problem is closely connected to an important task in machine learning: deciding how similar two data sets are. However, data sets, like our red and blue points, can change over time, meaning that new red or blue points may appear or disappear.
We designed a fast algorithm that, after each such change, quickly returns a pairing that is almost as good as the best possible matching. You might wonder why our algorithm doesn't always return the absolute best pairing. Interestingly, we proved that, in a certain sense, it's theoretically impossible to design a fast algorithm that always finds the best pairing - or even one that is much better than the one our algorithm produces.
Primary Area: General Machine Learning->Everything Else
Keywords: Euclidean bi-chromatic matching, dynamic algorithm, 1-Wasserstein distance
Submission Number: 11181
Loading