Risk Bounds for Mixture Density Estimation on Compact Domains via the h-Lifted Kullback–Leibler Divergence

Mark Chiu Chong; Hien Duy Nguyen; TrungTin Nguyen

Risk Bounds for Mixture Density Estimation on Compact Domains via the h-Lifted Kullback–Leibler Divergence

Mark Chiu Chong, Hien Duy Nguyen, TrungTin Nguyen

Published: 08 Dec 2024, Last Modified: 08 Dec 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We consider the problem of estimating probability density functions based on sample data, using a finite mixture of densities from some component class. To this end, we introduce the $h$-lifted Kullback--Leibler~(KL) divergence as a generalization of the standard KL divergence and a criterion for conducting risk minimization. Under a compact support assumption, we prove an $\mathcal{O}(1/{\sqrt{n}})$ bound on the expected estimation error when using the $h$-lifted KL divergence, which extends the results of Rakhlin et al. (2005, ESAIM: Probability and Statistics, Vol. 9) and Li & Barron (1999, Advances in Neural Information Processing Systems, Vol. 12) to permit the risk bounding of density functions that are not strictly positive. We develop a procedure for the computation of the corresponding maximum $h$-lifted likelihood estimators ($h$-MLLEs) using the Majorization-Maximization framework and provide experimental results in support of our theoretical bounds.

Submission Length: Long submission (more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=KD1WTwEXHy&noteId=BKzBMK9ldu

Changes Since Last Submission: We have revised the manuscript according to the instructions and comments from the Area Chair and the Reviewers. Specifically, we have made the following changes: - Reorganized the Preliminary Results section into Appendix C: Auxiliary Proofs for better flow. This appendix now also contains proofs of results presented in Section 2. - For better exposition, we have moved the proofs of the main results (previously Section 5) to Appendix A: Proofs of Main Results. Modified Proposition 3 to include a new result from discussions regarding the lower bounding of the lifted KL divergence by the total variation distance. - Included a new remark (Remark 7) that addresses and elaborates on the scope of our results to general compact spaces, not just compact subsets of $\mathbb{R}^d$. - To address all discussion points with the reviewers, we have included Appendix B: Discussions and Remarks Regarding h-MLLEs. These discussions are paraphrased and refined versions of our conversations with the reviewers. Appendix B includes the following subsections: - B.1: Elementary Derivations - Fundamental derivations of the minimum risk estimators associated with the h-MLLE construction. - B.2: Advantages and Limitations - In-depth discussions and comparisons of the h-MLLE approach versus alternatives in the literature, including the $L^2$ distance, the maximum likelihood method, and related divergence-based loss minimizing techniques. - B.3: Selection of the Lifting Density Function h - Discussion on the problem of optimally choosing the lifting density h. - B.4: Discussions Regarding the Sharpness of the Obtained Risk Bound - Comparisons of our achieved convergence rates to minimax rates and those obtained in other works. We also discuss the implied total variation rate obtained by our estimator and provide references regarding total variation estimation and how our method can be viewed from the perspective of the ($L^1$ literature. - B.5: The KL Divergence and the MLE - Comparison of our lifted-KL divergence to an alternative construction using the KL divergence of mixtures. We compare the sample estimator constructions arising from the two approaches and their merits. - B.6: Comparison of the MM Algorithm and the EM Algorithm - Discussion of the MM algorithm proposed in our numerical experiments compared to the equivalent EM algorithm for finite mixture models under the mixture KL divergence construction. - B.7: Non-Convex Optimization - Discussion of the numerical problem of computing the h-MLLE as a non-convex optimization problem, and why the MM algorithm approach provides advantages compared to other solution methods. - We have included our author information as well as directions to the paper's github repository. - We have comprehensively proofread our manuscript, fixing typos identified by the reviewers and others we found.

Code: https://github.com/hiendn/LiftedLikelihood

Assigned Action Editor: ~Yair_Carmon1

Submission Number: 2752

Loading