Mix2Vec: Unsupervised Mixed Data Representation

Chengzhang Zhu, Qi Zhang, Longbing Cao, Arman Abrahamyan

2020 (modified: 20 May 2022)DSAA 2020Readers: Everyone

Abstract: Unsupervised representation learning on mixed data is highly challenging but rarely explored. It has to tackle significant challenges related to common issues in real-life mixed data, including sparsity, dynamics and heterogeneity of attributes and values. This work introduces an effective and efficient unsupervised deep representer called Mix2Vec to automatically learn a universal representation of dynamic mixed data with the above complex characteristics. Mix2Vec is empowered with three effective mechanisms: random shuffling prediction, prior distribution matching, and structural informativeness maximization, to tackle the aforementioned challenges. These mechanisms are implemented as an unsupervised deep neural representer Mix2Vec. Mix2Vec converts complex mixed data into vector space-based representations that are universal and comparable to all data objects and transparent and reusable for both unsupervised and supervised learning tasks. Extensive experiments on four large mixed datasets demonstrate that Mix2Vec performs significantly better than state-of-the-art deep representation methods. We also empirically verify the designed mechanisms in terms of representation quality, visualization and capability of enabling better performance of downstream tasks.

0 Replies