Why should autoencoders work?

Published: 24 Feb 2024, Last Modified: 24 Feb 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Deep neural network autoencoders are routinely used computationally for model reduction. They allow recognizing the intrinsic dimension of data that lie in a $k$-dimensional subset $K$ of an input Euclidean space $\mathbb{R}^n$. The underlying idea is to obtain both an encoding layer that maps $\mathbb{R}^n$ into $\mathbb{R}^k$ (called the bottleneck layer or the space of latent variables) and a decoding layer that maps $\mathbb{R}^k$ back into $\mathbb{R}^n$, in such a way that the input data from the set $K$ is recovered when composing the two maps. This is achieved by adjusting parameters (weights) in the network to minimize the discrepancy between the input and the reconstructed output. Since neural networks (with continuous activation functions) compute continuous maps, the existence of a network that achieves perfect reconstruction would imply that $K$ is homeomorphic to a $k$-dimensional subset of $\mathbb{R}^k$, so clearly there are topological obstructions to finding such a network. On the other hand, in practice the technique is found to ``work'' well, which leads one to ask if there is a way to explain this effectiveness. We show that, up to small errors, indeed the method is guaranteed to work. This is done by appealing to certain facts from differential topology. A computational example is also included to illustrate the ideas.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Reply to REVIEWER LRt6: We sincerely appreciate the reviewer's very useful additional thoughtful feedback and suggestions. We have uploaded a new revision in order to incorporate these. Those suggestions that we did not implement, or did not implement in an entirely straightforward way, are described below. - We added the reviewer's suggested "game plan" for proving Theorem 1 to the start of section 2. However, since we feel that the game plan makes clear the roles of the lemmas used to prove Theorem 1, we limited ourselves to adding brief expository remarks before the lemmas. - Regarding Lemma 3, the reviewer asked about compactness of $C$, and about finiteness of $S$ vs. finiteness of the number of components of $K$. We had made a typo in writing "compactness of $C$", which should say "compactness of $K$". We fixed this in the revision, and sincerely thank the reviewer for helping us to notice this error. To clarify, both finiteness of $S$ and compactness of $K$ (which implies compactness of the closed subset $C\subseteq K$, and also finiteness of the number of components of $K$) are needed. - The reviewer asked us to move the location of Remark 8 in Section 4. After carefully reviewing the suggestion, we feel that the text flows best with the mentioned remark in its current location, but we are very open to discussing this issue with the reviewer and editor. - The reviewer provided additional remark on the value of adding a brief mention of using the latent space to "walk along" the data manifold, e.g., for purposes of interpolation. We thank the reviewer for convincing us of the importance of doing so, and we have added a brief mention along these lines to Remark 3. - Since the change just mentioned above addresses "walking along" the data manifold, and since we feel that the reviewer's related comment about combining multiple AEs is beyond the scope of our paper, we would prefer not to say anything about the latter. - We thank the reviewer for the suggestion to shorten the discussion, but we would prefer to see what the editor decides about this. - We want to mention that we removed one plot (showing a different view of data, hence a bit redundant) not because it was wrong, but just because we wanted to keep the page count under 12 pages.
Assigned Action Editor: ~Jeffrey_Pennington1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1682