TL;DR: We propose MetricEmbedding, which accelerates the Metric Nearness Problem using tropical inner product, achieving up to 60× faster computation and significantly reducing memory usage while preserving metric properties in large-scale matrices
Abstract: The Metric Nearness Problem involves restoring a non-metric matrix to its closest metric-compliant form, addressing issues such as noise, missing values, and data inconsistencies. Ensuring metric properties, particularly the $O(N^3)$ triangle inequality constraints, presents significant computational challenges, especially in large-scale scenarios where traditional methods suffer from high time and space complexity. We propose a novel solution based on the tropical inner product (max-plus operation), which we prove satisfies the triangle inequality for non-negative real matrices. By transforming the problem into a continuous optimization task, our method directly minimizes the distance to the target matrix. This approach not only restores metric properties but also generates metric-preserving embeddings, enabling real-time updates and reducing computational and storage overhead for downstream tasks. Experimental results demonstrate that our method achieves up to 60× speed improvements over state-of-the-art approaches, and efficiently scales from $1e4 \times 1e4$ to $1e5 \times 1e5$ matrices with significantly lower memory usage.
Lay Summary: (1) Large-scale data may violate basic metric rules due to noise or missing values, which can degrade the performance of downstream tasks.
(2) We propose a new method that uses tropical inner product to restore metric properties while significantly improving processing efficiency.
(3) This enables faster handling of large datasets and provides a more reliable foundation for tasks like search and recommendation.
Primary Area: General Machine Learning
Keywords: metric nearness problem, metric, triangle inequality, tropical inner product
Submission Number: 11898
Loading