Keywords: customer, embeddings, embedding, tabular, general, purpose, autoencoder, representation learning, general purpose, reconstruction loss, entity, entity embedding, entity representation, contractive autoencoder, dimensionality, reduction, latent, space, representation, feature, regularization, variational autoencoder
TL;DR: We introduce DeepCAE, an enhanced multi-layer contractive autoencoder, and benchmark autoencoder architectures in a general-purpose tabular data embedding framework for reconstruction and downstream performance, with a 34% reconstruction improvement.
Abstract: Recent advances in representation learning have successfully leveraged the underlying domain-specific structure of data across various fields. However, representing diverse and complex entities stored in tabular format within a latent space remains challenging.
In this paper, we introduce DeepCAE, a novel method for calculating the regularization term for multi-layer contractive autoencoders (CAEs). Additionally, we formalize a general-purpose entity embedding framework and use it to empirically show that DeepCAE outperforms all other tested autoencoder variants in both reconstruction performance and downstream prediction performance. Notably, when compared to a stacked CAE across 13 datasets, DeepCAE achieves a 34% improvement in reconstruction error.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9893
Loading