Latent space Projection Predictive Inference

TMLR Paper1747 Authors

26 Oct 2023 (modified: 20 Feb 2024)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: Given a reference model that includes all the available variables, projection predictive inference replaces its posterior with a constrained projection including only a subset of all variables. We extend projection predictive inference to enable computationally efficient variable and structure selection in models outside the exponential family. By adopting a latent space projection predictive perspective we are able to: 1) propose a unified and general framework to do variable selection in complex models while fully honouring the original model structure, 2) properly identify relevant structure and retain posterior uncertainties from the original model, and 3) provide an improved approach also for non-Gaussian models in the exponential family. We demonstrate the superior performance of our approach by thoroughly testing and comparing it against popular variable selection approaches in a wide range of settings, including realistic data sets. Our results show that our approach successfully recovers relevant terms and model structure in complex models, selecting less variables than competing approaches for realistic datasets.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Thanks for all the reviewers for the kind words and constructive suggestions. General improvements: - We have added clarifications to Sections 2 and 3. - We have defined more notation, including $\lambda_\bot$, $q(\lambda_\bot)$, $q_\lambda(\tilde{y})$, $p(\tilde{y}|\lambda_\bot)$, $p(\tilde{\eta}|\lambda_\bot)$.- We have provided a summary of the complete projection predictive approach following the suggestion by Reviewer wAtd. - Due to the space limit and existing review paper (McLatchie et al), we did not add full review of the approach. The focus of the paper is in evaluating the benefits of the latent space projection. - We have made some of the figures bigger, but due to unfortunate family issues, we were not able to do make the improvements suggested by Gp67. Below are some further responses to specific comments > What happens after the each optimization problem is solved separately? Are the resulting solutions averaged together? These are now clarified in Section 2 > Moreover, can one interchange minimizations and expectations? Yes as an approximation, which we now mention explicitly. > The motivation is unclear: is the interest in prediction itself or parameter estimation? We have extended the first sentence of the second paragraph to: "We propose an efficient, stable, and information theoretically justified method to make variable selection for non-normal observation models in or beyond the exponential family, while retaining the predictive performance of the full model." and the following bullet points. > My other main criticism is saying that methods don't exist to fit e.g. ordinal regression with L1 regularization, so just using a normal likelihood. First, I'm not convinced that is true. Second, even if it is it would be trivial to implement this with pytorch or TF and Adam. To our best knowledge they don't exist, and we're happy to be learn if they exist and happy to get a pointer to how to implement them with pytorch, TF, Adam. > In eq 1, expectations should be wrt to a distribution, not RVs. e.g. lambda* ~ p(lambda*|D) would be ideal. This expression is being minimized wrt q, specify that. Both notations are commonly used, and we opt here to use the more compact notation. > In eq 5 what are example of p(eta|lambda,X). Don't we usually just have e.g. eta=X lambda? Due to the space limitations we are not discussing other options, like, Gaussian processes (Piironen and Vehtari, 2016). > pg 4: "thresholding based on posterior"... if we just care about prediction why threshold? decopuling Thresholding is something that others do for variable selection. We do not like it, but mention it when refering to related methods. > Define ELPD. The ELPD acronym is defined when used first time (in the beginning of Section 5.2).
Assigned Action Editor: ~Matthew_J._Holland1
Submission Number: 1747
Loading