Generalization In Multi-Objective Machine Learning

TMLR Paper195 Authors

20 Jun 2022 (modified: 28 Feb 2023)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: Modern machine learning tasks often require considering not just one but multiple objectives. For example, besides the prediction quality, this could be the efficiency, robustness or fairness of the learned models, or any of their combinations. Multi-objective learning offers a natural framework for handling such problems without having to commit to early trade-offs. Surprisingly, statistical learning theory so far offers almost no insight into the generalization properties of multi-objective learning. In this work, we make first steps to fill this gap: we establish foundational generalization bounds for the multi-objective setting as well as generalization and excess bounds for learning with scalarizations. We also provide the first theoretical analysis of the relation between the Pareto-optimal sets of the true objectives and the Pareto-optimal sets of their empirical approximations from training data. In particular, we show a surprising asymmetry: all Pareto-optimal solutions can be approximated by empirically Pareto-optimal ones, but not vice versa.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We have uploaded a revision that reflects the reviewers' comments and our responses. Major changes to the original version are marked in blue in the PDF. In addition, we also fixed typos and clarified some phrases in the manuscript. **footnotes** We merged all footnotes into the main text, except two, for which we found that the flow of reading was interrupted too much when doing so. **Section 4** We now provide proofs or proof sketches for the main results in the main manuscript rather than just the appendix. **Section 4.3** We added a discussion of the *ray completeness* assumption, including the concrete example from our author response. We also clarify that we *do not* expect ray completeness to hold in general. We also discuss weaker alternatives to *ray completeness*. **Section 5** This section it new. It details three concrete scenarios in which our results provide new insights or improve over prior work. First, we reinterpret the classic LASSO in light of multi-objective learning. Our results on Pareto-optimality allow statements not only about optimal values of the loss, but also of the regularization term, which reflects the sparsity of the solution. Specifically, in general situations, solutions on the LASSO solution path might have suboptimal regularizer value with respect to the true objective, which could be problematic when using LASSO for the purpose of feature selection and/or interpretability. Second, we show how our scalarization-based results make it very simple to establish new (single-objective) generalization bounds, using the group-TERM and hierarchical-TERM losses (Li etal 2021) as examples. Third, we integrate the comparison to (Cortes et al, 2020) here. For the proof of our result we use different steps than in the prior submission, which makes the proof only a few lines long. **Section 6** This was the former Section 5. We now clarify its role as giving a high-level overview of other potential scenarios. **Section 7** Here and everywhere else we replaced *characterization* by *analysis*. We hope that the revision will clarify that our results, while elementary in terms of their proof techniques, provide a powerful framework for analyzing actual multi-objective problems as well as other problems that can be reinterpreted in a multi-objective way.
Assigned Action Editor: ~Sivan_Sabato1
Submission Number: 195
Loading