% \documentclass{uai2022} % for initial submission
\documentclass[accepted]{uai2022} % after acceptance, for a revised
                                    % version; also before submission to
                                    % see how the non-anonymous paper
                                    % would look like
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2022} % ptmx math instead of Computer
                                         % Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2022} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib}[compress] % has a nice set of citation styles and commands
\bibliographystyle{abbrvnat}
\renewcommand{\bibsection}{\subsubsection*{References}\small}
\usepackage{mathtools} % amsmath with fixes and additions
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams
\usepackage{hyperref}       % hyperlinks
\usepackage{url}            % simple URL typesetting
\usepackage{amsfonts}       % blackboard math symbols
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{nicefrac}       % compact symbols for 1/2, etc.
\usepackage{microtype}      % microtypography
\usepackage{xcolor}         % colors
\usepackage{graphicx}
\usepackage{xspace}
\usepackage{bm}
\usepackage{soul}
\usepackage{caption}
\usepackage{subcaption}
% \usepackage{todonotes}
\usepackage{wrapfig}
\usepackage{placeins}
\usepackage{enumitem}
\usepackage{footnote}
\usepackage{amsthm}
\usepackage{xr}

% \usepackage{thmtools, thm-restate}
\makesavenoteenv{tabular}
\makesavenoteenv{table}
\theoremstyle{plain}
\newtheorem{theorem}{Theorem}[section]
\newtheorem{manualtheoreminner}{Theorem}
\newenvironment{manualtheorem}[1]{%
  \renewcommand\themanualtheoreminner{#1}%
  \manualtheoreminner
}{\endmanualtheoreminner}
\newtheorem{manuallemmainner}{Lemma}
\newenvironment{manuallemma}[1]{%
  \renewcommand\themanuallemmainner{#1}%
  \manuallemmainner
}{\endmanuallemmainner}
\newtheorem{proposition}{Proposition}[section]
\newtheorem{lemma}{Lemma}[section]
\newtheorem{corollary}{Corollary}[section]
\theoremstyle{definition}
\newtheorem{definition}{Definition}[section]
\newtheorem{assumption}{Assumption}[section]
\newtheorem{Assumption}{Assumption}[section]
\newtheorem{result}{Result}[section]
\theoremstyle{remark}
\newtheorem{remark}{Remark}[section]
\usepackage[linesnumbered,ruled,vlined]{algorithm2e}
\newtheorem{innercustomgeneric}{\customgenericname}
\providecommand{\customgenericname}{}
\newcommand{\newcustomtheorem}[2]{%
  \newenvironment{#1}[1]
  {%
   \renewcommand\customgenericname{#2}%
   \renewcommand\theinnercustomgeneric{##1}%
   \innercustomgeneric
  }
  {\endinnercustomgeneric}
}

\newcustomtheorem{customthm}{Theorem}
\newcustomtheorem{customlemma}{Lemma}

\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}

\makeatletter
\newcommand*{\addFileDependency}[1]{% argument=file name and extension
  \typeout{(#1)}
  \@addtofilelist{#1}
  \IfFileExists{#1}{}{\typeout{No file #1.}}
}
\makeatother

\newcommand*{\myexternaldocument}[1]{%
    \externaldocument{#1}%
    \addFileDependency{#1.tex}%
    \addFileDependency{#1.aux}%
}
\myexternaldocument{daulton_446-supp}

\newcommand{\xxcomment}[4]{\textcolor{#1}{[$^{\tiny\textsc{#2}}_{\tiny\textsc{#3}}$ #4]}}
\newcommand{\mb}[1]{\xxcomment{cyan}{M}{B}{#1}}
\newcommand{\de}[1]{\xxcomment{red}{D}{E}{#1}}
\newcommand{\sd}[1]{\xxcomment{blue}{S}{D}{#1}}
\newcommand{\ALG}{MORBO\xspace}
\newcommand{\HV}{\textsc{HV}}
\newcommand{\HVI}{\textsc{HVI}}
\newcommand{\HVC}{\textsc{HVC}}
\newcommand{\TSHVI}{$q\textsc{NEHVI-1}$}
%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{Multi-Objective Bayesian Optimization over High-Dimensional Search Spaces}

% The standard author block has changed for UAI 2022 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[*,1,2]{\href{mailto:<sdaulton@fb.com>?Subject=Your UAI 2022 paper}{Samuel Daulton}{}}
\author[*,2]{David Eriksson}
\author[2]{Maximillian Balandat}
\author[2]{Eytan Bakshy}
% Add affiliations after the authors
\affil[*]{%
    Equal contribution
}
\affil[1]{%
    University of Oxford\\
    Oxford, UK
}
\affil[2]{%
    Meta\\
    Menlo Park, USA
}


\begin{document}
\maketitle

\begin{abstract}
    Many real world scientific and industrial applications require optimizing multiple competing black-box objectives.
    When the objectives are expensive-to-evaluate, multi-objective Bayesian optimization (BO) is a popular approach because of its high sample efficiency.
    However, even with recent methodological advances, most existing multi-objective BO methods perform poorly on search spaces with more than a few dozen parameters and rely on global surrogate models that scale cubically with the number of observations.
    In this work we propose \ALG, a scalable method for multi-objective BO over high-dimensional search spaces.
    \ALG identifies diverse globally optimal solutions by performing BO in multiple local regions of the design space in parallel using a coordinated strategy.
    We show that \ALG significantly advances the state-of-the-art in sample efficiency for several high-dimensional synthetic problems and real world applications, including an optical display design problem and a vehicle design problem with $146$ and $222$ parameters, respectively.
    On these problems, where existing BO algorithms fail to scale and perform well, \ALG provides practitioners with order-of-magnitude improvements in sample efficiency over the current approach.
\end{abstract}
% \vspace{-2ex}
\section{Introduction}
% \vspace{-1ex}
The challenge of identifying optimal trade-offs between multiple complex objective functions is pervasive in many fields, including machine learning~\citep{sener2018mtmoo}, science~\citep{gopakumar2018moomaterial}, and engineering~\citep{marler2004survey,mathern2021}.
For instance, Mazda recently proposed a vehicle design problem in which the goal is to optimize the widths of $222$ structural parts in order to minimize the total weight of three different vehicles while simultaneously maximizing the number of common gauge parts~\citep{kohira2018proposal}.
Additionally, this problem has $54$ black-box constraints that enforce important performance requirements such as collision safety.
Evaluating a design requires either crash-testing a physical prototype or running computationally demanding simulations.
In fact, the original problem was solved on what at the time was the world's fastest supercomputer and took around {$3$,$000$} CPU years to compute~\citep{oyama2017mazda}.
Another example is designing optical components for AR/VR applications, which requires optimizing complex geometries described by hundreds of parameters in order to identify designs that yield optimal trade-offs between image quality and efficiency of the optical device.
Evaluating a design involves either fabricating and measuring prototypes or running computationally intensive simulations.
For such problems, sample-efficient optimization is paramount.

Bayesian optimization (BO) has emerged as an effective, general, and sample-efficient approach for ``black-box'' optimization~\citep{jones98} and is highly effective for machine learning hyperparameter tuning~\citep{turner2021bayesian}.
However, in its basic form, BO is subject to important limitations.
In particular, (i) successful applications typically consider low-dimensional search spaces, usually with less than $20$ tunable parameters~\citep{frazier2018tutorial}, (ii) inference with the typical Gaussian Process (GP) surrogate models incurs cubic time complexity with respect to the number of data points, which prevents usage in the large-sample regime that is often necessary for high-dimensional problems, and (iii) most methods focus on single objective unconstrained problems.
As a result, BO cannot easily be applied to either of the aforementioned Mazda vehicle design or the AR/VR optical design problems. Moreover, high dimensional multi-objective problems requiring sample-efficient optimization are prevalent in many real-world settings such as groundwater remediation \citep{akhtar2015}, cell network configuration \citep{dreifuerst2021optimizing}, and water resource management \citep{bai17}.
The state-of-the-art approach for this class of problems is NSGA-II~\citep{deb02nsgaii}, a popular evolutionary strategy, but with poor sample-efficiency, which hinders the progress of the scientists running these experiments.

\begin{figure*}[!ht]
    \centering
    \includegraphics[width=0.96\textwidth]{figures/conceptual_plot_no_lines.pdf}
    \caption{An illustration of \ALG on: 2-objective benchmark problem with 2 parameters and 2 constraints called MW7 \citep{mw_test_problems} with $3$ TRs. The left-most plot illustrates how \ALG's center selection technique centers the TRs at Pareto optimal points across different parts of the Pareto frontier. This encourages \ALG{} to explore diverse parts of the Pareto Frontier, which is important to identifying the multiple disconnected regions on this MW7 problem. The three right-most plots illustrate the TRs over the design space along with contours of, respectively, the 2 objective metrics and the feasibility metric indicating whether all black-box constraints are satisfied. Note that the TRs overlap with one another and contain data points that were collected by other TRs. Hence, sharing observations collected by different TRs provides local models with more  observations than if each local model were only fitted to data collected by its corresponding TR.}
    \label{fig:conceptual}
\end{figure*}


% Although recent contributions have tackled the shortcomings of BO, they have done so largely independently.
% Namely, while multi-objective extensions of BO (e.g., ~\citet{paria2020flexible,konakovic2020diversity,daulton2020ehvi})
% take principled approaches to exploring the set of efficient trade-offs between outcomes, such methods generally do not scale well to problems with high-dimensional search spaces.
% On the other hand, many methods for high-dimensional BO (e.g.,~\citet{kandasamy15,wang2016rembo,HeSBO19,eriksson2019turbo,letham2019re,kirschner2019adaptive,eriksson2021high}) focus on single-objective optimization.
% Consequently, important problems that involve both a high-dimensional search space and multiple objectives are out of reach for existing methods~\citep{gaudrie2019high}.
In this paper, we close this gap by making BO applicable to challenging high-dimensional multi-objective problems.
To do so, we propose an algorithm called \ALG (``Multi-Objective Regionalized Bayesian Optimization'') that optimizes diverse parts of the global Pareto frontier in parallel using a coordinated set of local trust regions (TRs).
As shown in Figure~\ref{fig:conceptual}~(left), TRs are located at different solutions with diverse trade-offs between objectives.
\ALG{} performs local BO in each TR to mitigate over-exploration, a phenomenon that plagues many algorithms in high-dimensional settings~\citep{eriksson2021scalable}.
To enable scaling to large evaluation budgets, \ALG{} leverages \emph{local} GP surrogate models of the objective function, which reduces the time complexity for GP inference from $O(n^3)$, where $n$ is the number of data points, to $O(n_{\mathcal T}^3)$, where $n_{\mathcal T} \ll n$ is the number of local data points for a TR~$\mathcal T$.
To facilitate efficient and collaborative global optimization, \ALG{} \emph{passes information} between TRs in the following two ways: (1) Observations collected by one TR are shared with the others---which is particularly important when the TRs overlap as shown in Figure~\ref{fig:conceptual}, (2) \ALG{} selects a batch of candidates by leveraging the TRs to collaboratively maximize a global utility.
To ensure efficient global optimization, \ALG terminates under-performing TRs and allocates new TRs according to a global policy with a theoretical performance guarantee---a property that sets \ALG apart from most existing methods.

The significance of \ALG is that it is the \emph{first} multi-objective BO method that scales to hundreds of tunable parameters and thousands of evaluations, a setting where practitioners have previously had to fall back on alternative methods with much lower sample-efficiency, such as NSGA-II. Our comprehensive evaluation demonstrates that \ALG yields \emph{order-of-magnitude} savings in terms of time and resources compared to state-of-the-art methods on challenging high-dimensional multi-objective problems.

% Concretely,
% \begin{enumerate}[itemsep=0pt,topsep=1pt,leftmargin=14pt]
%     \item We propose \ALG , the first scalable algorithm for multi-objective BO that is practical for high-dimensional problems that require thousands of function evaluations.
%     \item \ALG uses local modeling with data-sharing to efficiently scale to high-dimensional input spaces and large evaluation budgets.
%     This unlocks the use of multi-objective BO on problems with hundreds of tunable parameters, a setting where practitioners have previously had to fall back on alternative, much less sample-efficient methods such as NSGA-II.
%     \item \ALG performs local optimization in multiple TRs to target diverse trade-offs using a collaborative, coordinated hypervolume-based acquisition criterion, which leads to well-distributed, high-quality Pareto frontiers.
%     \item We provide a comprehensive evaluation, demonstrating that \ALG significantly outperforms other state-of-the-art methods on challenging high-dimensional multi-objective problems and enables using BO for applications out of reach for existing BO methods. The achieved improvements in sample efficiency facilitate order-of-magnitude savings in terms of time and/or resources.
% \end{enumerate}

% The remainder of our work is structured as follows:
% In Section~\ref{sec:background}, we provide background on multi-objective and Bayesian optimization.
% In Section~\ref{sec:related_work}, we discuss related work.
% We detail our approach in Section~\ref{sec:hdbpomo} and present a comprehensive evaluation in Section~\ref{sec:Experiments}.
% We conclude in Section~\ref{sec:Discussion}.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \vspace{-2ex}
\section{Background}
\label{sec:background}
% \vspace{-1ex}
\subsection{Preliminaries}

% \vspace{-1ex}
\subsubsection{Multi-Objective Optimization}
% \vspace{-1ex}
In multi-objective optimization (MOO), the goal is to maximize (without loss of generality) a vector-valued objective function $\bm f(\bm x) = [f^{(1)}(\bm x), ..., f^{(M)}(\bm x)] \in \mathbb{R}^M$, where $M\geq2$ while satisfying black-box constraints $\bm g(\bm x) \geq \bm 0 \in \mathbb R^V$ where $V\geq 0$, $\bm x \in \mathcal X \subset \mathbb{R}^d$, and $\mathcal X $ is a compact set.
Usually, there is no single solution $\bm x^*$ that simultaneously maximizes all $M$ objectives and satisfies all $V$ constraints.
% Instead, the goal is to identify the set $\mathcal X^* \subseteq \mathcal X$ of designs such that the corresponding set of objective vectors $\mathcal P^* = \{\bm f(\bm x) | \bm x \in \mathcal X^*\}$ are \emph{Pareto-optimal}.
Hence, objective vectors are compared using Pareto domination.
\begin{definition}
An objective vector $\bm f(\bm x)$ \emph{Pareto-dominates} %another vector
$\bm f(\bm x')$, denoted as $\bm f(\bm x)\succ \bm f(\bm x')$,
if $f^{(m)}(\bm x) \geq f^{(m)}(\bm x')$ for all $m=1, ..., M$ and there exists at least one $m \in \{1, \dotsc, M\}$ such that $f^{(m)}(\bm x) > f^{(m)}(\bm x')$.
\end{definition}
\begin{definition}
The \emph{Pareto frontier} (PF) is the set of optimal trade-offs $\mathcal P(X)$ over a set of designs $X \subseteq \mathcal X$:
$$\mathcal P(X) = \{\bm f(\bm x) : \bm x \in X, \nexists ~\bm x' \in X ~s.t.~ \bm f(\bm x') \succ \bm f(\bm x)\}$$
Under black-box constraints, the \emph{feasible Pareto frontier} is defined as $\mathcal P_\text{feas}(X)$ = $\mathcal P(\{\bm x \in X : \bm g(\bm x) \geq \bm 0\})$.
\end{definition}
The goal of a MOO algorithm is to identify an approximate PF $\mathcal P(X_n)$ of the true PF $\mathcal P(\mathcal X)$ within a pre-specified budget of $|X_n| = n$ function evaluations. The quality of a PF is often evaluated using the hypervolume (\HV{}) indicator.
\begin{definition}
The \emph{hypervolume indicator}, $\HV{}(\mathcal P(X) | \bm r)$ is the $M$-dimensional Lebesgue measure $\lambda_M$ of the region dominated by $\mathcal P(X)$ and bounded from below by a reference point $\bm r \in \mathbb R^M$.
\end{definition}

% (for a formal definition see, e.g.,~\citet{Auger09}).
The reference point is typically provided by the practitioner based on domain knowledge~\citep{yang2019}.
MOO problems are often addressed using evolutionary algorithms (EA) such as NSGA-II~\citep{deb02nsgaii}.
However, EAs generally suffer from high sample-complexity, rendering them inapplicable under small evaluation budgets.
% \vspace{-1ex}
\subsubsection{Bayesian Optimization}
% \vspace{-1ex}
When high sample-efficiency is required, Bayesian optimization (BO) is a popular approach \citep{frazier2018tutorial}.
BO relies on a probabilistic surrogate model and an acquisition function that uses the surrogate model to provide the utility of evaluating a set of design points on the black-box function. The acquisition function is responsible for balancing exploration and exploitation.
In the multi-objective setting, a common approach is to optimize random scalarizations of the objectives~\citep{parego, paria2020flexible} using a single-objective acquisition function.
A more principled approach is to directly optimize the Pareto frontier by selecting candidates with maximum hypervolume improvement either in expectation under the GP posterior~\citep{emmerich2006} or using Thompson sampling (TS) \citep{tsemo}.
% Recently, Thompson sampling (TS) has also been shown to have strong empirical performance with randomly scalarized or hypervolume objectives~\citep{paria2020flexible,tsemo}.
% \vspace{-1ex}
\subsection{Related Work}
% \vspace{-1ex}
\subsubsection{Multi-objective Bayesian optimization}
% \vspace{-1ex}
There have been many recent contributions to multi-objective BO, e.g.,~\citet{konakovic2020diversity, daulton2020ehvi, daulton2022robust,tsemo}), but very few methods consider the high-dimensional setting and with large evaluation budgets. All of these methods described below rely on global GP models.
%, which is problematic: (i) as the dimensionality of the search space grows, the over-exploration issue mentioned previously becomes more prominent; (ii) a global GP model scales poorly with the number of function evaluations.
As a result, these methods have mostly been evaluated on low-dimensional problems, typically $d \ll 10$ \citep{konakovic2020diversity, tsemo}.
In the multi-objective BO literature, the largest search space we have found consists of $27$ parameters~\citep{paria2020flexible}.
Nevertheless, for completeness we review multi-objective BO methods that support generating large batches of designs.
DGEMO \citep{konakovic2020diversity} uses a hypervolume-based objective with heuristics to encourage diversity while exploring the PF.

Parallel expected hypervolume improvement ($q$EHVI)~\citep{daulton2020ehvi} has strong empirical performance, but its computational complexity scales exponentially with the batch size.
$q$NEHVI~\citep{daulton2021nehvi} improves scalability with respect to the batch size, but like DGEMO and $q$EHVI, $q$NEHVI has only been evaluated on low-dimensional search spaces.
TSEMO~\citep{tsemo} optimizes approximate GP function samples using NSGA-II and uses a hypervolume-based objective for selecting a batch of points from the NSGA-II population.
ParEGO~\citep{parego} and TS-TCH~\citep{paria2020flexible} use random Chebyshev scalarizations with parallel expected improvement~\citep{jones98} and Thompson sampling---where a design is sampled with probability proportional to a design being optimal \citep{thompson}---respectively.
ParEGO has been extended to the batch setting in various ways including: (i) MOEA/D-EGO~\citep{zhang_moead}, an algorithm that optimizes multiple scalarizations in parallel using MOEA/D~\citep{moead}, and (ii) $q$ParEGO~\citep{daulton2020ehvi}, which uses composite objectives with sequential greedy batch selection under different scalarization weights.
Information-theoretic methods, e.g., \citet{pesmo, pfes} have also garnered recent interest.

LaMOO \citep{zhao2021multiobjective} is a recent work that partitions the search space into ``good`` and ``bad`` regions and samples new designs from ``good`` regions using $q$EHVI or CMA-ES \citep{cmaes}. However, LaMOO-$q$EHVI relies on global GPs and is therefore prohibitively time-consuming with large evaluation budgets.
In addition, the authors propose to use rejection sampling to enforce that samples are from the, typically non-rectangular, "good" region, but rejection sampling is prohibitively time-consuming in high-dimensional search spaces (see Appendix~\ref{appdx:lamoo} for further discussion).
%, but are inherently prone to over-exploration in high-dimensional spaces, and %\citet{garridomerchn2020parallel} is the only such method that supports batch candidate generation.
% DE: Cutting this part for now to get rid of two references:
% Non-Bayesian methods such as MOPLS\footnote{At the time of writing, no code is publicly available.}~\citep{akhtar2019efficient} use deterministic surrogates, e.g., radial basis function (RBF) interpolants, which perform poorly when the observations are noisy~\citep{fasshauer2006scattered}.
% \vspace{-2ex}
\subsubsection{High-dimensional Bayesian optimization}
% \vspace{-1ex}
Two popular approaches for high-dimensional BO are (1) mapping the high-dimensional inputs to a low-dimensional space via a random embedding~\citep{wang2016rembo,HeSBO19,letham2019re} and (2) exploiting additive structure~\citep{kandasamy15,gardner2017discovering}.
% TODO: Expand this for the CR to include BOCK, LineBO, SAASBO, etc.
However, both families of methods require strong assumptions on the structure of the problem (low-dimensional linear or additive structure, respectively), and often perform poorly if the assumptions do not hold~\citep{eriksson2021high}.
This is especially problematic when optimizing multiple objectives since all objectives need to have the same assumed structure, which is unlikely in practice.
\citet{eriksson2021high} leverage a weaker assumption that the objective only depends on a small subset of the parameters and \citet{eriksson2021latencyaware} extended this approach to the multi-objective setting, but this approach
% and its multi-objective extension \citep{eriksson2021latencyaware}
requires using computationally-demanding Markov Chain Monte Carlo methods for fitting the model, which is only feasible in the small data regime.
% \vspace{-1ex}
\subsubsection{Trust Region Bayesian Optimization}
% \vspace{-1ex}
Another popular method for high-dimensional BO is TuRBO \citep{eriksson2019turbo}, which performs BO in local trust regions (TRs) to avoid over-exploration. In contrast with \citep{zhao2021multiobjective} which uses non-rectangular "good" regions, TuRBO uses hyperrectangular TRs, where each TR $\mathcal{T}$ has a center point $\bm x_\text{center}$ and an edge-length $L \in [L_\text{min}, L_\text{max}]$.
Each TR maintains success and failure counters that record the number of consecutive samples generated from the TR that improved or failed to improve (respectively) the objective.
If the success counter exceeds a predetermined threshold~$\tau_{\text{succ}}$, the TR length is increased to $\min\{2L, L_\text{max}\}$ and the counter is reset to zero.
Similarly, after $\tau_{\text{fail}}$ consecutive failures the TR length is set to $L/2$ and the failure counter is set to zero.
Finally, if the length $L$ drops below a minimum edge length $L_\text{min}$, the TR is terminated and a new TR is initialized.

In contrast with aforementioned methods, TuRBO makes no strong assumptions about the objectives.
Although TuRBO has been extended to handle black-box constraints \citep{eriksson2021scalable}, to our knowledge, all existing TR-based BO methods target single-objective optimization.
In addition, TuRBO does not pass information between TRs, which results in an inefficient use of the evaluation budget; these methods have not observed significant improvement from using multiple TRs.
Lastly, even though optimization is restricted to a local TR, TuRBO fits GP models to the entire history of data collected by a single TR which can lead to poor scalability in settings where TRs restart infrequently.
% \vspace{-2ex}
\subsection{Issues with Scalarized TuRBO}
% \vspace{-1ex}
Since ParEGO is a well-established method (in low-dimensional settings) that optimizes random Chebyshev scalarizations, a reasonable approach would be to extend TuRBO to the MOO setting by using multiple TRs in parallel where each TR optimizes a different random Chebyshev scalarization of the objectives.
However, as we demonstrate in the left subplot of Figure~\ref{fig:intro_fig}, this approach results in a PF with very poor coverage. This is because a single scalarization is used for the lifetime of each TR in order to maintain a stable objective. Optimizing a single scalarization per trust region often leads to better solutions with respect to that scalarization than optimizing the entire PF using a hypervolume-based acquisition functions, which requires exploration of different objective trade-offs. However, if TRs are not restarted frequently (e.g. because TuRBO continues to find better solutions with respect to that scalarization), only a small number of scalarizations will be used, which can lead to poor coverage of the PF. As shown in Figure 2, we observe that MORBO yields PFs with better coverage (diversity of trade-offs). In addition, the TRs in TuRBO are independent; they do not pass information about evaluated designs and observations, and they do not collaboratively aim to optimize the global PF---rather, they act in isolation to optimize their own objectives. Together, this leads to an inefficient use of the sample budget.


\begin{figure*}[!ht]
    \centering
    \includegraphics[width=0.96\textwidth]{figures/intro_3_plots_new.pdf}
    \caption{
        Objective values achieved on a $2$-objective DTLZ2 function with $d=100$ after $600$ evaluations, batch size $50$, and $3$ TRs.
        The scatter plot illustrates the search behavior.
        The grey circles indicate the initial space-filling design, which is the same for both methods. The other marker shapes and colors indicate which of the $3$ TRs obtained a given solution.
        The black line indicates the approximate Pareto frontier identified by each method.
        (Left) A straightforward extension of TuRBO where each TR optimizes a random Chebyshev scalarization of the objectives does not explore the trade-offs between the objectives because the TRs are rarely terminated under this approach, which leads only a few scalarizations being used.
        (Center) In contrast, \ALG employs a center selection strategy that actively targets under-explored regions of the Pareto frontier and uses a hypervolume-based acquisition function that is known to reward to high quality Pareto frontiers~\citep{zitzler03, Couckuyt14, yang2019} and explores the entirety of the PF.
        (Right) \ALG can discover disconnected regions of global PF on the MW7 function ($d=10$, with $2$ constraints) by using $5$ TRs to locally optimize disjoint regions of PF collaboratively, in parallel. This is stark contrast with TuRBO with Chebyshev scalarizations which the left plot shows yield approximate Pareto frontiers with poor coverage and diversity, even when the true PF is connected and simple.
    }
    \label{fig:intro_fig}
\end{figure*}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \vspace{-2ex}
\section{MORBO}
% \vspace{-1ex}
\label{sec:hdbpomo}
We now introduce \ALG, a \emph{collaborative} multi-TR approach for constrained high-dimensional multi-objective BO.
Rather than following TuRBO's approach of employing multiple independent TRs, \ALG shares observations across TRs to provide each TR with all available information about the objectives and constraints relevant for local optimization in the TR. Moreover, \ALG further departs from TuRBO by (1) selecting TR center points in a coordinated fashion to encourage identifying Pareto frontiers with good coverage, (2) choosing new candidate designs by collaboratively optimizing a shared global utility, and (3) employing local models to reduce computational complexity and improve scalability in large data regimes.
As shown in the center plot of Figure~\ref{fig:intro_fig}, \ALG identifies a high quality PF with much better coverage than the aforementioned simple TuRBO extension.
For the remainder of this section, we describe the core components of \ALG, which are also summarized in Algorithm~\ref{algo}.
% Furthermore, data is not shared across TRs when using multiple TRs, which results in an inefficient use of the evaluation budget.
%

\begin{algorithm*}[!ht]
    \DontPrintSemicolon
    \KwIn{Objective functions $f$, Number of trust region $n_\text{TR}$, Initial trust region length $L_\text{init}$, Maximum trust region length $L_{\max}$, Minimum trust region length $L_{\min}$.}
    \KwOut{Approximate Pareto frontier $\mathcal P_n$}
    Evaluate an initial set of points and initialize the trust regions $\mathcal T_1, ..., \mathcal T_{n_\text{TR}}$ using the center selection procedure described in Section~\ref{sec:center_selection} and mark center points as unavailable for other trust regions. \\
    $X_0 \leftarrow \emptyset{}, Y_0 = \emptyset{}, t \leftarrow 1$\\
    \While{budget not exhausted}{
        Fit a local model within each trust region. \\
        Select $q$ candidates using the sequential greedy \HVI{} procedure described in Section~\ref{sec:batch_selection}.\\
        Evaluate candidates on the true objective functions and obtain new observations.\\
        \For{$j=1,..., n_\text{TR}$}{
            Update trust regions with new observations as described in Section \ref{sec:hdbpomo}.\\
            Increment success/failure counters as described in Section \ref{sec:hdbpomo} for observations from $T_j$.\\
            Update edgelength $L_j$ for $\mathcal T_j$.\\
            \If{$L_j < L_\text{min}$}{
                Terminate $\mathcal T_j$.\\
                % Terminate $\mathcal T_j$ and reinitialize $\mathcal T_j$ with edgelength $L_\text{init}$ and $N_\text{init}$ new random points.
                Fit GP to restart points $\mathcal D_{t-1} = (X_{t-1},Y_{t-1})$: $\bm f_{t-1} \sim P(\bm f | \mathcal D_{t-1})$.\\
                Sample $\bm\lambda\sim S_+^{M-1}$ and $\tilde{\bm f}_{t-1} \sim P(\bm f | \mathcal D_{t-1})$, where $S_+^{M-1} = \{\bm w\in \mathbb R_+^M : ||\bm w||_2 = 1\}$.\\
                $\bm x_t \leftarrow \argmax_{\bm x \in \mathcal X} s_{\bm\lambda}[\tilde{\bm f}_{t-1}(\bm x)]$, where $s_{\bm\lambda}[\bm y] = \min_m (\max(\frac{y_m}{\lambda_m}, 0))^M$ and $\cdot_i$ denotes the $i^\text{th}$ element.\\
                Evaluate $\bm x_t$ on the true objective functions and obtain new observation $\bm y_t$.\\
                Reinitialize $\mathcal T_j$ with edgelength $L_\text{init}$ centered at the $\bm x_t$.\\
                Set $X_t \leftarrow X_{t-1} \cup \{\bm x_t\}, Y_t \leftarrow Y_{t-1} \cup \{\bm y_t\}$, $t \leftarrow t+1$.

            }
            Update center to the available point with maximum \HVC{} (globally if $\mathcal T_j$ was terminated otherwise within $\mathcal T_j$).
        }
    }
    \Return{Approximate PF across observed function values}.
    \caption{Summary of \ALG}
\label{algo}
\end{algorithm*}
% \vspace{-1ex}
\subsection{Collaborative Batch Selection via Global Utility Maximization}\label{sec:batch_selection}
% \vspace{-2ex}
Maximizing hypervolume improvement (\HVI{}) has been shown to produce high-quality and diverse PFs~\citep{emmerich2006}.
Given a reference point, the hypervolume improvement from a set of points is the increase in \HV{} when adding these points to the previously selected points.
Expected HVI (EHVI) is a popular acquisition function that integrates \HVI{} over the GP posterior.
However, maximizing EHVI directly requires re-computing the GP posterior and sampling from it in each gradient step, which becomes prohibitively slow as the number of objectives (and constraints) and in-sample data points increases.

To allow scalability to large batch sizes $q$, we instead use Thompson sampling (TS) to draw $q$ posterior samples from the GP and optimize \HVI{} under each realization. This approach can be viewed as a single-sample approximation of EHVI \citep{daulton2021nehvi}.
We select $q$ points $\bm x_1, ..., \bm x_q$ for the next batch in a \emph{sequential greedy} fashion and condition upon the previously selected points in the batch by computing the HVI with respect to the current PF $\mathcal P$.
In particular, to select the $i^\text{th}$ point from a set of $r$ candidate points $\hat{\bm x}_1, \ldots, \hat{\bm x}_r$ we draw a sample from the joint posterior over $\bm f(\{\bm x_{1}, \ldots, \bm x_{i-1}\} \cup \{\hat{\bm x}_1, \ldots, \hat{\bm x}_r\})$, which yields the realization $\{\tilde{\bm f}(\bm x_{1}), \ldots, \tilde{\bm f}(\bm x_{i-1}), \tilde{\bm f}(\hat{\bm x}_1), \ldots, \tilde{\bm f}(\hat{\bm x}_r)\}$.
We select the $i^\text{th}$ point as the candidate point that maximizes the HVI jointly with the realizations $\tilde{\bm f}(\bm x_{1}), \ldots, \tilde{\bm f}(\bm x_{i-1})$ of the previously selected points as shown in Figure~\ref{fig:hvi}.
Conditioning on the previously selected points and computing the HVI under a sample from the joint posterior over the previously selected points and the discrete set of candidates leads to more diverse batch selection compared to selecting each point independently. Moreover, this approach effectively lets TRs collaboratively maximize the global HVI utility function.
Using this global utility, an individual TR considers the iteration a success if at least one proposed candidate improves the global HV and a failure otherwise.

Another benefit of HV-based acquisition functions is that they naturally provide utility values for set of points, which enables the TRs to target different parts of the PF. %  and collaboratively maximize HV.
This is particularly appealing in settings where the PF may be disjoint or may require exploring different parts of the search space. As shown in the right plot of Figure~\ref{fig:intro_fig}, \ALG recovers diverse regions of a disconnected PF.
Lastly, we note that this batch selection strategy also allows to straightforwardly implement \emph{fully asynchronous} optimization, where evaluations are dispatched to different ``workers'' and new candidates are generated whenever there is capacity in the worker pool.
In the asynchronous setting, success/failure counters and TRs can be updated after every $q$ observations are received, and intermediate observations can immediately be used to update the local models.
% While it may be possible to improve the performance of TuRBO using more sophisticated scalarization approaches, the ablation study in Figure~\ref{fig:ablation_study} shows that using hypervolume improvement in \ALG{} improves performance.
% \vspace{-1ex}
\subsection{Coordinated Trust Region Center Selection}\label{sec:center_selection}
% \vspace{-1ex}
In (constrained) single-objective optimization, previous work centers the local TR at the best (feasible) observed point.
However, in the multi-objective setting, there is typically no single best solution.
Assuming noise-free observations, \ALG selects the center to be the feasible point on the PF with maximum hypervolume contribution (\HVC{})~\citep{Beume2007,loshchilov2011}.
If there is no feasible point, \ALG chooses the point with the smallest total constraint violation (see Appendix~\ref{appdx:constraint_handling} for details on center selection with constraints).
Given a reference point, the \HVC{} of a point on the PF is the reduction in HV if that point were to be removed; that is, the HVC of a point is its exclusive contribution to the PF.
Centering a TR at the point with maximal HVC collected by that TR promotes coverage across the PF, as points in crowded regions will have lower contribution.
\ALG selects TR centers based on their HVCs in a sequential greedy fashion, excluding points that have already been selected as the center for another TR.

% Furthermore, a TR gives higher priority to any point currently inside of the TR over points outside of the TR (with potentially larger HVC) to maintain local optimization.
% Besides encouraging diversity in the exploration of the PF, using multiple TRs results in multiple GP models, each fitted on a smaller amount of data, which greatly improves scalability.
% \vspace{-1ex}
\subsection{Local Modeling}\label{sec:local_modeling}
% \vspace{-2ex}
Most BO methods use a single global GP model, often with a stationary kernel (e.g. Mat\'ern-$5/2$) using automatic relevance determination (ARD) fitted to all observations collected so far.
While a global model is necessary for most BO methods, \ALG only requires each model to be accurate within the corresponding TR.
To increase scalability, we employ local modeling where we only include the observations contained within a local modeling hypercube with edge length~$2L$. The motivation for using the observations from a slightly larger hypercube is to improve the model close to the TR boundary.

In previous trust region BO works \citep{eriksson2019turbo, eriksson2021scalable, wan2021think}, each TR uses a GP that is fitted to the all observations collected by that TR (rather than only a set of local observations in or near the TR), which leads to scalability issues due to the cubic time complexity of GP inference if the TR collects many observations.
% This may lead to poor performance since stationary kernels assume that the lengthscales are constant across the input space; an assumption that is often violated in practice~\citep{snoek14}.
In addition, fitting a GP solely to data collected by a single TR ignores observations collected by other TRs and makes inefficient use of the sampling budget.
In contrast, \ALG shares observations across TRs and employs local models, where models are fit to all observations within a hypercube with edge length~$2L$. This significantly reduces the computational cost since exact GP fitting scales cubically with the number of data points. Under limited assumptions on the distribution of data across TRs, using local models results in speedups of $O(n_\text{TR}^2 / \eta^3)$, where $\eta$ is the average number of TR modeling spaces a data point resides in. Empirically, we demonstrate (see Figure~\ref{fig:tr_traces} in Appendix~\ref{appdx:additional_results}) that $\eta < 1$ as the optimization progresses and the TRs shrink, and we find that this translates into speedups of two orders of magnitude relative to global modeling as shown in Appendix~\ref{appdx:wall_time_comparisons}. See Appendix~\ref{appdx:complexity_local} for more details on the complexity.
% % \vspace{-1ex}
\subsection{Re-initialization Strategy}
% % \vspace{-1ex}
Although \ALG performs local optimization within a TR, we ensure global optimization by re-initializing TRs using a
principled  %theoretically-grounded
technique based on hypervolume scalarizations~\citep{zhang2020random}. A HV scalarization is defined as $s_{\bm\lambda}[\bm y] = \min_m (\max(\frac{y_m}{\lambda_m}, 0))^M$, where $\cdot_m$ denotes the $m^\text{th}$ component~\citep{zhang2020random}. Let $\mathcal D_{t-1} = (X_{t-1}, Y_{t-1})$ be the set of previous re-initialization (restart) points $X_{t-1} = \{\bm x_i\}_{i=1}^{t-1}$ and corresponding observations $Y_{t-1} = \bm f(X_{t-1})$, where $X_0 = \emptyset{}$ and $Y_0 = \emptyset{}$. Given $\mathcal D_{t-1}$, we determine the center point $x_t$ of the new TR by maximizing a random HV scalarization of the objectives under a posterior sample from a global GP posterior conditioned on $D_{t-1}$: $\tilde{\bm f} \sim P(\bm f|\mathcal D_{t-1})$. This ensures that TRs are initialized in diverse parts of the objective space and yields a global optimization performance guarantee (Section~\ref{sec:theory}).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Theoretical Analysis}
% \vspace{-1ex}
\label{sec:theory}
We analyze the performance of \ALG{} in terms of its cumulative HV regret. The instantaneous HV regret $R(\mathcal P_t)$ after $t$ TR restarts is defined as the difference in HV dominated by the true Pareto frontier $\mathcal P^*$ and the approximate Pareto frontier $\mathcal P_t$: $R(\mathcal P_t) = \HV{}(\mathcal P^*) -  \HV{}(\mathcal P_t).$ The (cumulative) HV regret after $T$ restarts is the sum of the instantaneous regret over all restarts: $R_T = \sum_{t=1}^T R(\mathcal P_t)$. First, we show that a TR will only evaluate a finite number of samples before restarting.
\begin{lemma}
% \begin{lemma}
\label{lemma:restarts}
Let $\bm f \in [0, B]^M$, and assume that \ALG{} only considers a newly evaluated sample to be an improvement (for updating the corresponding TR's success and failure counters) if it increases the HV by at least $\delta \in \mathbb R^+$ and assume that success counter threshold $\tau_\text{succ} = \infty$.\footnote{As stated in Appendix \ref{appdx:experimentdetails}, we use $\tau_\text{succ} = \infty$ in all of our experiments.} Then each TR will only evaluate a finite number of samples.
\end{lemma}
The proof is given in Appendix~\ref{appdx:proofs}. Having established that TRs only evaluate a finite number of designs, we now bound the hypervolume regret with respect to the number of restarted TRs.
The bound leverages the kernel-dependent maximum information gain $\gamma_T$---which measures the decrease in uncertainty after $T$ observations ---and is commonly used to analyze regret in BO \citep{ucb}.
\begin{theorem}
\label{thm:hv_regret}
Let $\bm f \in [0,B]^M$ for $B>0$ and let each component $f^{(m)}$ for $m=1, ..., M$ follow a Gaussian distribution with marginal variances $\sigma \leq 1$ and independent observation noise $\epsilon_m \sim \mathcal N(0, \sigma_m^2)$ such that $\sigma_m^2 \leq \sigma^2 \leq 1$. Let $\mathcal P_t$ denote the Pareto frontier over $\bm f(X_t)$, where $X_t$ is the set of TR re-initialization points after $t$ TRs have been restarted.
%Assume that \ALG{} only considers a newly evaluated sample to be an improvement (for updating the corresponding TR's success and failure counters) if it increases the hypervolume by at least $\delta \in \mathbb R^+$ and assume that success counter threshold $\tau_\text{succ} = \infty$.
Suppose further that the conditions of Lemma~\ref{lemma:restarts} hold.
Then, the cumulative hypervolume regret $R_T$ of \ALG after $T$ restarts is bounded by:
% $$R_T \leq M^2(\sqrt{2e\pi}B/2)^Md^{\frac{1}{2}}[\gamma_T T\ln(T)]^{\frac{1}{2}}.$$
$$R_T \leq M^2(\sqrt{2e\pi}B/2)^M \sqrt{d\gamma_T T\ln(T)}.$$
% In addition,
% $\HV(\mathcal P_t) \geq \HV{}(\mathcal P^*) - \varepsilon_T$ where $\varepsilon_T = O\big(M^2(\sqrt{2e\pi}B/2)^Md^{\frac{1}{2}}\big[\gamma_T\frac{\ln(T)}{T}\big]^{\frac{1}{2}}\big)$.
\end{theorem}
Up to logarithmic terms, this regret bound is on the order of $\tilde{\mathcal O}(\sqrt{T})$.
This bound is significant because, to our knowledge, \citet{zhang2020random} is the only other work to bound the HV regret of multi-objective BO algorithms.
This makes \ALG{} the first sample-efficient large-scale, MOO algorithm with bounded regret.
The proof, given in Appendix~\ref{appdx:proofs}, leverages the hypervolume regret bound from \citet{zhang2020random}.
However, our regret bound is with respect to the number of restart points (rather than evaluations)---a difference that can be viewed as a cost of focusing on large-scale problems which BO with global GPs cannot address.
Moreover, our regret analysis in terms of the number of restarts is similar to the convergence guarantees of gradient-based TR optimization methods~\citep{Yuan_areview} and can be viewed as a multi-objective analogue of the performance guarantees of recent single-objective BO-based TR methods \citep{wan2021think}.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Experiments}
% \vspace{-1ex}
\label{sec:Experiments}
We evaluate \ALG on an extensive suite of benchmarks with various numbers of input parameters ($d$), objectives ($M$), and constraints ($V$).
In Appendix~\ref{appdx:baby_problems}, we consider a vehicle ($d=5$) and a welded beam ($d=4$, $V=4$) design problem to show that \ALG{} is competitive with other algorithms on problems it was not designed for.
We consider three challenging real-world problems: a trajectory planning problem ($d=60$), a problem of designing optical systems for AR/VR applications ($d=146$), and an automotive design problem ($d=222, V=54$) .
In addition, we evaluate \ALG{} on DTLZ3, DTLZ5, and DTLZ7 problems with $2$/$4$ objectives ($6$ problems in total) in Appendix \ref{appdx:additional_results}.

We compare \ALG to multi-objective BO methods ($q$NEHVI, $q$ParEGO, TS-TCH, TSEMO, DGEMO, MOEA/D-EGO), recent work leveraging search space partitioning (LaMOO-CMAES, LaMOO-$q$NEHVI), a widely used evolutionary algorithm (NSGA-II), and Sobol---a quasi-random baseline where designs are sampled from a scrambled Sobol sequence \citep{owen2003quasi} (see Appendix~\ref{appdx:experimentdetails} for more details on the methods).
% approximate Thompson sampling using random Fourier features~\citep{rahimi_rff} with Chebyshev scalarizations (), TSEMO, DGEMO, MOEA/D-EGO, and Sobol quasi-random search.
\ALG is implemented using BoTorch~\citep{balandat2020botorch} and the code is available at \url{https://github.com/facebookresearch/morbo}.
We run all methods for $20$ replications and initialize them using the same quasi-random initial points for each replication.
We use the same hyperparameters for \ALG on all problems and conduct analyze the sensitivity of \ALG to its hyperparameters in Figure~\ref{fig:ablation_study}.
See Appendix~\ref{appdx:experimentdetails} for details on the experiment setup.
All experiments used a Tesla V100 SXM2 GPU (16GB RAM).
\begin{figure*}[!ht]
    \centering
    \includegraphics[width=0.96\textwidth]{figures/real_world2.pdf}
    \caption{
        (Left) \ALG outperforms other methods on the trajectory planning problem ($d=60$).
        (Middle) Illustration of the results on the Optical design problem ($d=146$). NSGA-II performs better than the BO baselines but is not competitive with \ALG.
        (Right) \ALG shows compelling performance on the Mazda vehicle design problem ($d=222$) with $54$ black-box constraints. For all plots, we show the mean and one standard error of the mean over 20 replications.
    }
    \label{fig:real_world_experiments}
\end{figure*}
% \vspace{-1ex}
\subsection{Large-Scale Real-World Problems}
% \vspace{-1ex}
\paragraph{Trajectory Planning}
\label{sec:real_world_problems}
We consider a trajectory planning problem similar to the rover trajectory planning problem considered in~\citep{wang2018batched}.
As in the original problem, the goal is to find a trajectory that maximizes the reward when integrated over the domain.
The trajectory is determined by fitting a B-spline to $30$ design points in the 2-objective plane, which yields a $60$-dimensional optimization problem.
In this experiment, we constrain the trajectory to begin at the pre-specified starting location, but we do not require it to end at the desired target location.
In addition to maximize the reward of the trajectory, we also minimize the distance from the end of the trajectory to the intended target location.
Intuitively, these two objectives are expected to be competing because reaching the exact end location may require passing through areas with lower associated reward.
The results from {$2$,$000$} evaluations using batch size $q=50$ and $200$ initial points are presented in Figure~\ref{fig:real_world_experiments}, which shows that \ALG performs the best and even state-of-the-art methods such as $q$NEHVI do not out perform NSGA-II.

% \vspace{-1ex}
\paragraph{Optical design problem}
% \label{sec:optical_design}
We consider the problem of designing an optical system for an augmented reality (AR) see-through display.
%\footnote{The code for the optical design problem is proprietary, but we aim to open source a surrogate model by the time of publication.}
This optimization task has $146$ parameters describing the geometry and surface morphology of multiple optical elements in the display stack.
Several objectives are of interest in this problem, including display efficiency and display quality.
Each evaluation of these metrics requires a computationally intensive physics simulation %of the optical system
that takes several hours to run.
In this benchmark, the task is to explore the Pareto frontier between display efficiency and display quality (both objectives are normalized w.r.t. the reference point).
%
We consider $250$ initial points, batch size $q=50$, and a total of {$10$,$000$} evaluations.
This is out of reach for the other BO baselines due to runtime considerations, and so we run $q$NEHVI, $q$ParEGO, TS-TCH, TSEMO, MOEA/D-EGO, for {$2$,$000$} evaluations and DGEMO for {$1$,$000$} evaluations.
We were only able to run LaMOO-CMAES for $7,600$ evaluations before it overflowed GPU memory.
Figure~\ref{fig:real_world_experiments} shows that \ALG achieves substantial improvements in sample efficiency compared to NSGA-II.
Furthermore, observe that no other baselines are competitive with NSGA-II except in the very small sample regime (less than $500$ evaluations).

\begin{figure*}[!ht]
    \centering
    \includegraphics[width=0.96\textwidth]{figures/ablation_log_scale_dtlz2_update_scale.pdf}
    \caption{
        We investigate the sensitivity of \ALG with respect to its hyperparameters.
        We observe that using multiple TRs performs significantly better than using a single TR and that data-sharing and the use of a hypervolume based acquisition function are important components of \ALG.
    }
    \label{fig:ablation_study}
\end{figure*}

% \vspace{-1ex}
\paragraph{Mazda vehicle design problem}
% \label{sec:mazda}
We consider the $3$-car Mazda benchmark problem~\citep{kohira2018proposal}.
This challenging MOO problem involves tuning $222$ decision variables that represent the thickness of different structural parts.
The goal is to minimize the total vehicle mass of the three vehicles (Mazda CX-$5$, Mazda $6$, and Mazda $3$) as well as maximizing the number of parts shared across vehicles.
Additionally, there are $54$ black-box output constraints (evaluated jointly with the two objectives) that enforce that designs meet performance requirements such as collision safety standards.
This problem is, to the best our knowledge, the largest MOO problem considered by any BO method and requires fitting $56$ GP models to the objectives and constraints.
The original problem underlying the Mazda benchmark was solved on what at the time was the world's fastest supercomputer and took around $3$,$000$ CPU years to compute~\citep{oyama2017mazda}.
We consider a budget of {$10$,$000$} evaluations using batches of size $q=50$ and $300$ initial points.

Figure~\ref{fig:real_world_experiments} demonstrates that \ALG clearly outperforms the other methods.
A feasible design satisfying the black-box constraints was provided to all methods for all replications as part of the initial $300$ design points.
However, in subsequent evaluations Sobol did not find another feasible design, illustrating the challenge of satisfying the $54$ constraints.
While NSGA-II made progress from the initial feasible solution, it is not competitive with \ALG.
NSGA-II and Sobol are the only applicable baselines because standard multi-objective BO methods are impractically slow with $56$ \emph{global} GPs and LaMOO does not support black-box constraints.
% \vspace{-1ex}
\subsection{Ablation study}\label{sec:ablation}
% \vspace{-2ex}
Finally, we study the sensitivity of \ALG with respect to the number of TRs ($n_\text{TR}$), the failure tolerance ($\tau_\text{fail}$), and sharing observations across TRs, local modeling, HVI acquisition function, and the re-initialization strategy.
Using several TRs allows \ALG to explore different parts of the search space that potentially contribute to different parts of the Pareto frontier.
The failure tolerance controls how quickly each TR shrinks:
A large $\tau_\text{fail}$ leads to slow shrinkage and potentially too much exploration, while a small $\tau_\text{fail}$ may cause each TR to shrink too quickly and not collect enough data.
\ALG uses $5$ TRs and $\tau_\text{fail} = \max(10, \frac{d}{3})$ by default, similar to what is used by~\citet{eriksson2019turbo}.

We consider the DTLZ2 problem ($d=100$, $M=2$), the trajectory planning problem ($d=60$, $M=2$), and the optical design problem ($d=146$, $M=2$).
Figure~\ref{fig:ablation_study} shows that \ALG with the default settings performs well on all three problems.
We observe that multiple TRs and the HVI acquisition function are important as neither a single TR nor a Chebyshev scalarization performs well.
The performance of \ALG is robust to the choice of failure tolerance except for on the optical design problem where using a value of $10$ is clearly worse than the default and causes the TRs to shrink too quickly.
Not sharing data between TRs results in inferior results on the DTLZ2 and optical design problems.
While using a global GP model achieves good results on the DTLZ2 and trajectory planning problems, it does not perform as well on the optical design problem.
A global GP also comes at a high computational cost.
Using a global GP, running \ALG with a budget of {$10$,$000$} evaluations on the optical design problem required $30$ hours of computational overhead, whereas \ALG did {$10$,$000$} evaluations in less than an hour using local models.
Lastly, we find consistently strong performance for both our default HV scalarization-based re-initialization strategy and a strategy that selects a new design at random (denoted as "Random restart points").
The former allows us to bound \ALG's regret.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \vspace{-1ex}
\section{Discussion}
% \vspace{-1ex}
\label{sec:Discussion}
We proposed \ALG, an algorithm for multi-objective BO over high-dimensional search spaces.
By using a coordinated, collaborative multi-trust-region approach with scalable local modeling, \ALG scales gracefully to high-dimensional problems and high-throughput settings.
In a comprehensive experimental evaluation, we showed that \ALG allows us to \emph{effectively tackle important real-world problems that were previously out of reach for existing BO methods}.
We showed that \ALG achieves substantial improvements in sample efficiency compared to existing state-of-the-art methods such as evolutionary algorithms.
Due to the lack of alternatives, NSGA-II has been the method of choice for many practitioners, and we expect \ALG to provide practitioners with significant savings in terms of time and resources across the many disciplines that require solving challenging optimization problems.

However, there are some limitations to our method.
Although \ALG can handle a large number of black-box constraints, using hypervolume-based acquisition means the computational complexity scales poorly with the number of objectives.
Furthermore, \ALG is optimized for the large-batch high-throughput setting and other methods may be more suitable for and achieve better performance on low-dimensional problems with small evaluation budgets.
% \begin{contributions}
% Briefly list author contributions.
% This is a nice way of making clear who did what and to give proper credit.
% H.~Q.~Bovik conceived the idea and wrote the paper.
% Coauthor One created the code.
% Coauthor Two created the figures.
% \end{contributions}

% \begin{acknowledgements}
% Briefly acknowledge people and organizations here.
% \emph{All} acknowledgements go in this section.
% \end{acknowledgements}

\newpage
% \addcontentsline{toc}{subsubsection}{References}
% {
%     \small
%     \bibliographystyle{abbrvnat}
%     \bibliography{ref}
% }
\bibliography{daulton_446}

%%%%% UNCOMMENT FOR ARXIV TO INCLUDE APPENDIX IN THE SAME PDF %%%%%
% \appendix
% \onecolumn
% \input{daulton_446-supp-input}

\end{document}
