Abstract: The Langevin algorithm is a classic method for sampling from a given pdf in a real space. In its basic version, it only requires knowledge of the gradient of the log-density, also called the score function. However, in deep learning, it is often easier to learn the so-called "noisy-data score function", i.e. the gradient of the log-density of noisy data, more precisely when Gaussian noise is added to the data. Such an estimate is biased and complicates the use of the Langevin method. Here, we propose a noise-corrected version of the Langevin algorithm, where the bias due to noisy data is removed, at least regarding first-order terms. Unlike diffusion models, our algorithm needs to know the noisy-data score function for one single noise level only. We further propose a simple special case which has an interesting intuitive interpretation of iteratively adding noise the data and then attempting to remove half of that noise.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=qYFaivVgq1
Changes Since Last Submission: We would like to thank the reviewers again for constructive criticism. We have now revised the manuscript as follows:
1. The main revision is to reformulate the claim of the paper regarding theoretical analysis. The claim is no longer that the new algorithm converges in any rigorous sense. Now, the claim is that the bias introduced by estimation of the score from noise data (i.e. using DSM) is removed. This is a weaker claim since it is well-known that even the basic Langevin only converges in some slightly complicated limit sense, with bias remaining for any non-infinitesimal step size. In particular, the bias due to finite step size is not analyzed here since it is not affected by the modification proposed. This revision is implemented as follows:
a) Section 3.2.2 (which was earlier section 3.2) has been thoroughly rewritten. That is, the main theorem has been thoroughly revised, or rather replaced by a new theorem. We realized we were trying to prove the wrong thing, since it is irrelevant for the current claim if the algorithm converges. So, the new theorem focuses on analyzing in what sense and to what extent that bias is removed by the proposed modification of the algorithm.
b) Consequently, a new proof of the (new) main theorem is provided in Appendices C and D.
c) Some wordings, especially in the Conclusion, have been changed to emphasize the change of the claim in the paper. The explanation of the simulation results has been slightly reformulated to emphasize the new perspective.
d) Unrelated to the above, but further strengthening the analysis, a new semi-heuristic analysis is added in Section 3.2.1.
3. A proof of Corollary 1 has been added as the new Appendix B, according to the request of Reviewer jqkk. The notation in the Corollary has also been clarified in footnote 1.
4. An analysis of the Gaussian case has been added as the new Appendix E, according to a suggestion by Reviewer rYQi.
5. Minor points:
a) To avoid terminological confusion the term "noisy score [function]" has been replaced by "noisy-data score [function]" throughout the paper. This emphasizes that it is the score function of noisy data, *not* a score function to which noise has been added.
b) Some further literature has been discussed, in particular the reference given by Reviewer jqkk, in the last paragraph of "Related work".
c) The definition of the score function in eq (2) has been clarified regarding notation
7. Just to clarify the extent of this revision: neither the algorithm nor the simulations have been revised in any way.
We hope the manuscript will now be considered suitable for publication.
Assigned Action Editor: ~Atsushi_Nitanda1
Submission Number: 4651
Loading