Multi-objective Bayesian Optimization with Heuristic Objectives for Biomedical and Molecular Data Analysis Workflows
Abstract: Many practical applications require optimization of multiple, computationally expensive, and possibly competing objectives that are well-suited for multi-objective Bayesian optimization (MOBO) procedures. However, for many types of biomedical data, measures of data analysis workflow success are often heuristic and therefore it is not known a priori which objectives are useful. Thus, MOBO methods that return the full Pareto front may be suboptimal in these cases. Here we propose a novel MOBO method that adaptively updates the scalarization function using properties of the posterior of a multi-output Gaussian process surrogate function. This approach selects useful objectives based on a flexible set of desirable criteria, allowing the functional form of each objective to guide optimization. We demonstrate the qualitative behaviour of our method on toy data and perform proof-of-concept analyses of single-cell RNA sequencing and highly multiplexed imaging datasets.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. We clarified the noisy setting of the problem at hand (Introduction, pp. 1-2).
2. We added an example showing that an objective may be non-useful, but this would not necessarily be known prior to the analysis (Introduction, p. 2).
3. We added the definition for the variance of the fitted objective-specific observation noise (Section 2.2, p. 3).
4. We added citations for multi-output GPs (Bonilla et al., 2007); multi-objective optimization (Deb et al., 2014); the result about monotonically increasing scalarization functions (Roijers et al., 2013, Zintgraf et al., 2015); PESMO (Hernandez-Lobato et al., 2016); MESMO (Belakaria et al., 2019); USeMO (Belakaria et al., 2020); a review on using BO for protein design (Yang et al., 2019); contextual GP bandit optimization (Krause and Ong, 2011); the Brent's method (Brent, 2013); the KroneckerMultiTaskGP model (Maddox et al., 2021); Thompson sampling (Thompson, 1933); expected improvement (Mockus et al., 1978).
5. We discussed uncertainty-based MOBO methods as related work (Section 2.3, pp. 4-5).
6. We re-wrote the section on applications of autoML and BO to genomics and molecular biology (Section 2.4, p. 5).
7. We noted that using multi-output GPs may be suboptimal in some cases (Section 3.1, p. 5).
8. We clarified that in our setting, a practitioner has no prior preference over objectives (Section 3.2, p. 5) and that they do not participate during acquisition (Section 2.4, p. 5).
9. We clarified the definition of our Explainability behaviour (Section 3.2, p. 6).
10. We clarified that Theorem 3.1. applies to linear scalarization (Section 3.3, p. 6) and expanded our proof (Appendix E, p. 28).
11. We added the definition for the inter-objective agreement behaviour conditional distribution (Section 3.3, p. 7).
12. We added the hyperparameter values for the Maximum not at boundary behaviour that we used in our experiments to the main text (Section 3.3, p. 7).
13. We added an algorithm describing the overall approach (Section 3.4, p. 8).
14. We added USeMO to our evaluation and performed experiments with it. Results are added in Tables 1, 2 and Supplementary Figures 4, 5, 7, 8, 10. Details on how we ran USeMO are provided in Appendix A.1 and Supplementary Table 3. We discussed these new results in Sections 4.2, 4.3, pp. 10-11.
15. We added an explanation on why we construct meta-objectives for our evaluation (Section 3.5, p. 8).
16. We added plots showing Pareto dominated and non-dominated points for the objectives together with acquisitions of all methods (Supplementary Figures 5, 8) and a discussion of these plots (Sections 4.2, 4.3, pp. 10-11).
17. We added plots showing inclusion probabilities computed by MANATEE for the objectives (Supplementary Figures 6, 9) and a discussion of these plots (Sections 4.2, 4.3, pp. 10-11).
18. We added plots of "learning curves" of the regret metrics for all methods (Supplementary Figure 10), a description of how we computed them (Appendix A.5, p. 20), and a discussion of these plots (Sections 4.2, 4.3, pp. 10-11).
19. We updated the values in the scRNA-seq HVG selection experiment due to a minor bug with setting the random state of the clustering algorithm, but the interpretation of the results remained unchanged (Table 2, Supplementary Tables 5, 7, Appendix A.2, p. 19).
20. We discussed potential under-exploration by our method and a suggestion on how it could be mitigated (Discussion, p. 12).
21. We discussed a potential limitation of our inter-objective agreement behaviour if suboptimal but well-performing solutions are of interest (Discussion, p. 12).
22. We discussed the scenario when all objectives have their maximum at the boundary of the parameter range (Discussion, p. 12).
23. We added a note that there is no guarantee that objective weights will be on the same scale as these depend on the definition of the behaviour conditional distributions (Discussion, p. 12).
24. We added a note that our setup assumes that all tasks are quantifiable for all observations but this may not always hold, like in contextual settings (Discussion, p. 12).
25. We added a clarification that zeros of the first derivative of the posterior mean are found using Brent's method implemented in scipy (Appendix A.1, p. 16).
Assigned Action Editor: ~Antti_Honkela1
Submission Number: 386
Loading