Joint Hierarchical Representation Learning of Samples and Features via Informed Tree-Wasserstein Distance

Ya-Wei Eileen Lin; Ronald R. Coifman; Gal Mishne; Ronen Talmon

Joint Hierarchical Representation Learning of Samples and Features via Informed Tree-Wasserstein Distance

Ya-Wei Eileen Lin, Ronald R. Coifman, Gal Mishne, Ronen Talmon

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: hierarchical representation learning, unsupervised learning, tree-Wasserstein distance, diffusion geometry, hyperbolic embeddings, Haar wavelet, manifold learning, hyperbolic graph convolutional networks

TL;DR: We introduce an iterative framework to jointly learn hierarchical representations of both samples and features using tree-Wasserstein distance and data-driven Haar wavelet filters.

Abstract: High-dimensional data often exhibit hierarchical structures in both modes: samples and features. Yet, most existing approaches for hierarchical representation learning consider only one mode at a time. In this work, we propose an unsupervised method for jointly learning hierarchical representations of samples and features via Tree-Wasserstein Distance (TWD). Our method alternates between the two data modes. It first constructs a tree for one mode, then computes a TWD for the other mode based on that tree, and finally uses the resulting TWD to build the second mode’s tree. By repeatedly alternating through these steps, the method gradually refines both trees and the corresponding TWDs, capturing meaningful hierarchical representations of the data. We provide a theoretical analysis showing that our method converges. We show that our method can be integrated into hyperbolic graph convolutional networks as a pre-processing technique, improving performance in link prediction and node classification tasks. In addition, our method outperforms baselines in sparse approximation and unsupervised Wasserstein distance learning tasks on word-document and single-cell RNA-sequencing datasets.

Primary Area: General machine learning (supervised, unsupervised, online, active, etc.)

Submission Number: 16201

Loading