Abstract: Real-world datasets are often corrupted with noise. Probabilistic models are developed for learning in such scenarios, particularly where data samples are noisy and uncertainty needs to be considered, due to their probabilistic inference framework. Various types of probabilistic models, such as the Bayesian Gaussian Process Latent Variable Model (BGPLVM), are widely used in learning problems that emphasize uncertainty. However, despite the promising results from these probabilistic models, an analytic performance analysis with noise-corrupted uncertain data has not yet been conducted. In this paper, we focus on the BGPLVM and propose to analyze the performance upper bound of probabilistic models for clustering tasks quantitatively. We review the BGPLVM and propose an analytic performance upper bound, defined as the minimum probability of false alarm for clustering problems with datasets corrupted by Gaussian noise. This upper bound represents the best performance achievable by any clustering algorithm, regardless of the specific algorithm used, with BGPLVM serving as the means of dimensionality reduction. The results derived from Gaussian noise scenarios are then generalized to non-Gaussian scenarios. Numerical results are provided to validate our proposed performance upper bound of the BGPLVM in clustering tasks with noise-corrupted data. This framework can be generalized to evaluate the performance upper bound of a wide class of probabilistic models.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Bruno_Loureiro1
Submission Number: 3498
Loading