Shared Stochastic Gaussian Process Latent Variable Models: A Multi-modal Generative model for Quasar spectra

TMLR Paper3386 Authors

25 Sept 2024 (modified: 29 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: This work proposes a scalable probabilistic latent variable model based on Gaussian processes in the context of multiple observation spaces. We focus on an application in astrophysics where it is typical for data sets to contain both observed spectral features as well as scientific properties of astrophysical objects such as galaxies or exoplanets. In our application, we study the spectra of very luminous galaxies known as quasars, and their properties, such as the mass of their central supermassive black hole, their accretion rate and their luminosity, and hence, there can be multiple observation spaces. A single data point is then characterised by different classes of observations which may have different likelihoods. Our proposed model extends the baseline stochastic variational Gaussian process latent variable model (GPLVM) to this setting, proposing a seamless generative model where the quasar spectra and the scientific labels can be generatedsimultaneously when modelled with a shared latent space acting as input to different sets of Gaussian process decoders, one for each observation space. Further, this framework allows training in the missing data setting where a large number of dimensions per data point may be unknown or unobserved. We demonstrate high-fidelity reconstructions of the spectra and the scientific labels during test-time inference and briefly discuss the scientific interpretations of the results along with the significance of such a generative model.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Current version includes text revisions and clarifications proposed by the Reviewers. --------------------------------------------------------- Current revision includes validation experiments proposed by Reviewer qTSY. Enclosed in Appendix C and D. Sensitivity to SNR. Calibration plots for scientific labels. ---------------------------------------- Current revision includes changes requested by Reviewer qTSY. Thank you. ------------------------------------- Dear Reviewers, The revision uploaded has most of the minor changes / typos / equations and strucural chanes addressed as well as a rewording of the scientific interpretation section. Below, I summarise the additional experiments which I aim to include in a future revision (although, I am unsure how much time I have to do so before the decision). All of them seem very reasonable and within the scope so I don't forsee any specific challenges in including them in the manuscript. 1. A traditional regression experiment, predicting scientific labels from spectra, this approach would have to adapt for missing dimensions in the spectra. [Reviewer 8g1J] 2. Baseline model for Table 2 uncertainty quantification experiments.[Reviewer T1nP] 3. Masked autoencoder type benchmark model adapted to multi-view setting for reconstruction accuracy comparison.[Reviewer T1nP] 4. A calibration plot (ground truth vs. predictions in percentile buckets) to understand prediction quality for the labels.[Reviewer qTSY ] 5. A plot for error vs. SNR to establish support for the statement that they are uncorrelated. [Reviewer qTSY] 6. Plots resolution and fix '2\sigma' labels to 1.96\sigma or 95% prediction interval. [Reviewer qTSY]
Assigned Action Editor: ~Manuel_Haussmann1
Submission Number: 3386
Loading