Interpretable Embedding and Visualization of Compressed Data

Published: 2023, Last Modified: 06 Feb 2025ACM Trans. Knowl. Discov. Data 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Traditional embedding methodologies, also known as dimensionality reduction techniques, assume the availability of exact pairwise distances between the high-dimensional objects that will be embedded in a lower dimensionality. In this article, we propose an embedding that overcomes this limitation and can operate on pairwise distances that are represented as a range of lower and upper bounds. Such bounds are typically estimated when objects are compressed in a lossy manner, so our approach is highly applicable in the case of big compressed datasets. Our methodology can preserve multiple aspects of the original data relationships: distances, correlations, and object scores/ranks, whereas existing techniques typically preserve only distances. Comparative experiments with prevalent embedding methodologies (ISOMAP, t-SNE, MDS, UMAP) illustrate that our approach can provide fidelitous preservation of multiple object relationships, even in the presence of inexact distance information. Our visualization method is also easily interpretable.
Loading