Keywords: Information-theoretic generalization error analysis, generalization error analysis, VQ-VAE
Abstract: Encoder--decoder models, which transform input data into latent variables, have achieved a significant success in machine learning. Although the generalization capability of these models has been theoretically analyzed in supervised learning focusing on the complexity of latent variables, the contribution of latent variables in generalization and data generation capabilities are less explored theoretically in unsupervised learning. To address this gap, our study leverages information-theoretic generalization error analysis (IT analysis). Using the supersample setting in recent IT analysis, we demonstrate that the generalization gap for reconstruction loss can be evaluated through mutual information related to the posterior distribution of latent variables, conditioned on the input data, without relying on the decoder's information. We also introduce a novel permutation-symmetric supersample setting, which extends the existing IT analysis and shows that regularizing the encoder's capacity leads to generalization. Finally, we guarantee the 2-Wasserstein distance between the true data distribution and the generated data distribution, offering insights into the model’s data generation capabilities.
Supplementary Material: zip
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4620
Loading