% \documentclass{uai2024} % for initial submission
\documentclass[accepted]{uai2024} % after acceptance, for a revised version; 
% also before submission to see how the non-anonymous paper would look like 
                        
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2024} % ptmx math instead of Computer
                                         % Modern (has noticeable issues)
% \documentclass[mathfont=newtx]{uai2024} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
\usepackage{amssymb} 
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams
\usepackage{soul}
\usepackage{amsthm}
\usepackage{algorithm2e}
\usepackage[dvipsnames]{xcolor}
\usepackage{dblfloatfix}
% \usepackage{algorithm}
% \usepackage{algpseudocode}
\theoremstyle{definition}
\newtheorem{proposition}{Proposition}
\newtheorem{lemma}{Lemma}
\newtheorem{remark}{Remark}

\usepackage{natbib}
\bibliographystyle{plainnat} % Choose the desired bibliography style

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

%\title{Adaptive Clinical Trial Recruitment: A Machine Learning Approach}
\title{Enhancing Patient Recruitment Response in Clinical Trials: an Adaptive Learning Framework}

% The standard author block has changed for UAI 2024 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1]{Xinying~Fang}
\author[1]{\href{mailto:<shouhao.zhou@psu.edu>?Subject=Your UAI 2024 paper}{Shouhao~Zhou}{}}

\affil[1]{%
    Division of Biostatistics and Bioinformatics\\
    Department of Public Health Sciences\\
    Penn State University College of Medicine\\
    Hershey, Pennsylvania, USA
}


\newcommand{\yx}{\color{blue}\em } % begin change
\newcommand{\yy}{\color{red}\em } % begin change
\newcommand{\xx}{\color{black}\rm } % end change
  
  \begin{document}
\maketitle

\begin{abstract}
  Patient recruitment remains a key challenge in contemporary clinical trials, often leading to trial failures due to insufficient recruitment rates.  To address this issue, we introduce a novel adaptive learning framework that integrates machine learning methods to facilitate evidence-informed recruitment. Through dynamic testing, predictive learning, and adaptive pruning of recruitment plans, the proposed framework ensures superiority over the conventional random assignment approach. We discuss the practical considerations for implementing this framework and conduct a simulation study to assess the overall response rates and chances of improvement. The findings suggest that the proposed approach can substantially enhance patient recruitment efficiency. By systematically optimizing recruitment plan allocation, this adaptive learning framework shows promise in addressing recruitment challenges across broad clinical research settings, potentially transforming how patient recruitment is managed in clinical trials.
\end{abstract}

\section{Introduction}\label{sec:intro}
Patient recruitment is a principal challenge in conducting clinical trials \citep{Friedman2015}. In a recent survey \citep{ehealth}, 86\% of clinical trials did not meet enrolment timelines, and approximately one-third of phase III trials, representing the most rigid clinical studies that often take 5-15 years to implement and cost hundreds of millions of dollars, failed owing to participant enrolment problems. Our own experiences mirror these challenges, as seen in the PCORI-funded WISE trial, where slower-than-expected recruitment led to significant alterations \citep{Sciamanna2018}, including the modification of the study's primary endpoint and a reduction of total sample size by half. %\xx  table ... \xx

% what has been done in clinical trials to improve recruitment rate:
Efforts have been undertaken to enhance patient recruitment in clinical trials. The recruitment guideline developed by the GREET project (Guidance to Recruitment: Examining Experiences at Clinical Trial Sites) \citep{greet} identifies the availability of adequate staff resources, appropriate budget allocation, and proactive principal investigators as the top three facilitators of successful recruitment endeavors. However, it is essential to acknowledge that while these solutions demonstrate efficacy within specific trial contexts, their generalizability and efficiency are not guaranteed, subject to all kinds of predictable and completely unforeseen problems \citep{Friedman2015}. 

Particularly, a more strict requirement for patient recruitment comes to pragmatic trials, such as studies to evaluate participants in the "SilverSneaker" program \citep{SilverSneakers_2023}. Since pragmatic trials are designed to assess the efficacy of interventions in real-world, routine practice conditions \citep{Patsopoulos2011} and seek maximal heterogeneity in the clinical setting and patient characteristics, it requires a large sample size to give the intervention the best chance to demonstrate a beneficial effect \citep{mosio}. Thus, with limited resources, efficient patient recruitment is vital to enhance generalizability for a wide range of participants.

\xx Artificial Intelligence (AI) and Machine Learning (ML) \xx have great potential in trial participant identification and selection: by using automated natural language processing tools, AI can effectively connect individuals to trials to increase participant identification \citep{Miller2023, Weissler2021}; ML, particularly through neural network models, can reduce sample heterogeneity by identifying patients of specific characteristics with the prediction of benefit for patient selection \citep{Harrer2019,twostageML}. Despite these advancements, significant challenges persist, particularly in effectively encouraging participant responses to recruitment efforts. Even in a pool of well-identified potential participants, the recruitment response rate could be intolerably low (e.g., 3.2\% in the WISE trial \citep{sciamanna2021working}, and 3\% projected for Hispanic and Latino groups in the TIME trial \citep{PCORI-2}), posing significant difficulties in meeting enrollment targets and can jeopardize the success of a trial. Addressing this issue calls for the development and implementation of more effective and customized recruitment plans. However, due to the lack of specific data or evidence in individual trial contexts (e.g., various comparative interventions and targeted disease populations), it is still challenging to accurately predict the effectiveness of various recruitment plans, and estimate the achievable recruitment response rates. The need for innovative approaches that can navigate these complexities and effectively increase recruitment response rates is evident, making an important direction for further exploration and advancement using ML.


This paper seeks to transform clinical trial recruitment by harnessing ML to develop a cutting-edge, evidence-informed framework of adaptive recruitment strategy. Specifically, we will leverage predictive learning techniques over multiple candidate recruitment plans in a sequential recruitment setting. As shown in Figure \ref{fig:conceptgraph}, the process will go through $T$ rounds until the most effective recruitment plans are identified. Each round will go through four steps: data collection, quantitative modeling, recruitment plan allocation prediction, and sample size determination. By adaptively refining the selection of effective recruitment plans, we aim to achieve enhanced participant engagement by optimizing the recruitment plan allocation. 

\captionsetup{font=small} 
\begin{figure}[t]
  \centering
  \includegraphics[width=1\linewidth]{Concept graph.png}% allocation rate prediction % Quantitative modeling
  \caption{Concept graph of learning strategy within each round of recruitment.}\label{fig:conceptgraph}
\end{figure}

\xx
\vspace{3pt}
% \section{Difference of the proposed method with sequential adaptive designs} \label{sec:seq_adaptive_designs}
\textbf{Related works: }Sequential trial designs \citep{karrison2003group,li1995adaptive} adaptively update the weights of arm allocation to minimize the risk of inferior treatment assignments \citep{hu2006theory}. The proposed recruitment strategy shares some similarities with sequential designs in adaptive learning, but they are distinct in the following aspects:
\begin{itemize} \vspace{-5pt}
    \item \textit{Study focus}: Sequential adaptive designs have an ``arm-oriented'' focus, aiming to identify the best treatment (arm) among the tested treatments. In contrast, the adaptive learning methods in our trial recruitment have a ``response-oriented'' focus. It aims to improve the overall recruitment response rate until little improvement can be made, irrespective of which recruitment plan (arm) achieves a good response.
    \item \textit{Scale of interventions}: Traditional sequential adaptive designs can only handle a few treatments (small $K$) using classical statistical methods, whereas our approach is better suited for AI/ML techniques to systematically search within a large space of recruitment plans (large $K$). 
    \item \textit{Stage in clinical trials}: Sequential adaptive designs are implemented to allocate treatments during the intervention stage, while it is often costly and takes a few years to run even with a limited sample size. In contrast, the proposed adaptive learning framework targets the recruitment stage, which is fast-paced with a huge sample space (e.g., 175,000 in the SilverSneakers study).
\end{itemize}\vspace{-3pt}

Subsequently, the design methodologies required for these two types of studies differ significantly.  While statistical approaches are often developed and applied for treatment assignment, ML methods are naturally suited to optimize efficiency in the recruitment setting. Our proposed approach establishes an adaptive ML framework to enhance participant recruitment in clinical trials. To the best of our knowledge, this innovative application of ML techniques to improve recruitment efficiency represents a groundbreaking development in the field.

\xx 

\section{An Illustrative %Clinical 
Trial Example%: SilverSneakers Exercise Program
}\label{sec:silversneaker}

A PCORI-funded clinical trial investigates the health and social effect of proactively utilizing the “SilverSneakers”, an insurance-covered exercise program, among seniors with osteoarthritis \citep{SilverSneakers_2023}. Osteoarthritis is a common medical condition associated with pain, deterioration, and an increased risk of falls and fractures for the age group of 65 and over. In a pragmatic randomized parallel-group controlled trial setting, the randomization unit is the individual participant, who will be randomly assigned to one treatment arm (utilizing proactive care condition that provides support to activate insurance-funded SilverSneakers benefits) and one control arm (utilizing usual care condition that provided beneficiaries with their usual SilverSneakers benefits information) with a 1:1 ratio.

Scheduled for 2024-2025, this trial is budgeted to send out 175,000 recruitment letters for an enrollment target of 1,454 U.S. Medicare Advantage members. Despite SilverSneakers' substantial potential benefits and a large number of planned recruitment letters, the study still faces a significant challenge: recruiting enough participants. This concern is pivotal given the often lower recruitment response rates in disadvantaged groups, which makes participant recruitment a notably challenging task. 

To enhance the recruitment process, a practical approach involves employing diverse recruitment modalities, such as utilizing different designs and features in recruitment letters to elicit higher response rates. These design features, each presented as a categorical variable with two or more levels, can be used individually or in combination, creating a high-dimensional sample space of recruitment (letter) plans. 

In this vast sample space, it is critical to predict the most effective design features, or their combinations, to improve trial recruitment responses. Yet, pre-trial knowledge is scarce. Before the initiation of a trial, we have limited understanding of how potential participants might react to recruitment plans, due to the differences in proposed interventions, targeted study population, and specific trial context.  In behavior research, experts' consensus may easily and significantly deviate from or contradict actual outcomes \citep{milkman2021megastudies}. This lack of foresight extends to predicting the effectiveness or ranking of recruitment plans. 

Therefore, it illustrates an urgent need for new methodologies to enhance recruitment efficiency using adaptive strategies of learning and prediction. The integration of ML in sequential participant recruitment fills a gap in the existing literature, underlining the transformative potential for clinical trial breakthroughs in practice.


\section{Method}
The procedure of sequential participant recruitment aims to enhance the overall response rate (ORR) by optimizing the recruitment plan allocation. In this section, we first delineate the notations pertinent to the proposed approach, then delve into the modeling and design considerations essential for the selection of recruitment plan(s).

\vspace{-5pt}
\subsection{Notations}
Suppose we have $K$ candidate recruitment plans to be distributed to $N$ potential participants. We assume that each participant can only receive one of the recruitment plans, at a random round $t \in T_0$ where $T_0$ is the planned maximum recruitment round, and the assignment is random following an allocation probability $w_k^{(t)}, k=1,…K$. Assume the $k$th recruitment plan has a true response rate of $p_k$, which is fixed but unknown, and can only be estimated from current trial data. We allow the adaptive recruitment process to stop early, so the actual total recruitment round $T\leq T_0$.

In the context of \textit{SilverSneakers} trial with 8 binary design features, we specify $K=2^8=256$ letter designs, $T_0=6$ maximum rounds, and $N=175,000$ potential participants (i.e., each individual receives only one %randomly assigned
recruitment letter).

\begin{figure}[!htb]
  \centering
  \includegraphics[width=1\linewidth]{AdaptiveProcedure.png}
  \caption{Adaptive procedure for recruitment plan assignment.}\label{fig:adaptiveprocedure}
\end{figure}

In a sequential recruitment process (e.g., Figure \ref{fig:adaptiveprocedure}), the $N$ potential participants are randomly partitioned into $T$ sequential cohorts, and the individuals in the cohort $t$ will only be reached out by the clinical team at the round $t$ of recruitment. The maximum number of patients involved at a non-terminal round (i.e., $t<T$) is $N/T_0$. The assignment probability $w_k^{(t)}$ for the $k$th recruitment plan could vary by time (the superscript in notation), possibly updated by the data $D^{(t-1)}$ collected up to previous $t-1$ rounds of recruitment. %If an adaptive learning is effective, we are expecting the high assignment probabilities $w_k^{(t)}$ for effective recruitment plans (i.e., recruitment plan $k$'s with high response rate $p_k$) to increase over time, while the assignment probabilities will decrease or even be reduced to 0 for non-competitive recruitment plans (i.e., recruitment plan $k$'s with low response rate $p_k$). 
\xx
If the adaptive learning approach is effective, we expect the assignment probabilities $w_k^{(t)}$ for high-performing recruitment plans (those with high response rates $p_k$) to increase over time. Conversely, the probabilities should decrease, potentially reaching zero, for underperforming recruitment plans with low response rates $p_k$. Overall, the algorithm will dynamically prioritize the more successful strategies while phasing out ineffective ones.
\xx
For consistent notation, we denote $D^{(0)}$ as the prior data before initiating the recruitment. If there are no preliminary studies,  $D^{(0)}=\varnothing$.%and set $w_k^{(1)}=1/K$ for initial equal assignment probability at round 1. 

\vspace{-5pt}
\subsection{Sequential recruitment procedure for adaptive learning}\label{sec:procedure}
Algorithm \ref{alg:adaptive} presents the general recruitment strategy of an adaptive learning framework (Figure \ref{fig:adaptiveprocedure}), which aims to allocate effective recruitment plans to improve the overall recruitment response rate. \xx In the initial round ($t=1$), all treatment plans are assigned equal probabilities $w_k^{(1)} = 1/K, k = 1,\cdots, K$, and participant responses are collected as $D^{(1)}$. For subsequent rounds $t > 1$, we follow the steps of Figure \ref{fig:conceptgraph} to perform the adaptive allocation. Specifically, at round $t$, we have response data $D^{(t-1)}$ from previous rounds. An (ensemble) learning model is applied on $D^{(t-1)}$ to predict the response of recruitment plans $\hat{\boldsymbol{p}}^{(t-1)}$. The allocation rates $\boldsymbol{W}^{(t)}$ are derived from the predicted response rates $\hat{\boldsymbol{p}}^{(t-1)}$, and we randomly assign the recruitment plans to participants in cohort $t$ based on the allocation rates $\boldsymbol{W}^{(t)}$. New data $D^{(t)}/D^{(t-1)}$ are thus collected after potential participants respond to the assigned plans. This iterative process continues until the maximum number of rounds $T$ is reached or one of the early termination conditions 3(a)-(b) in Algorithm \ref{alg:adaptive} is met.
\xx

Mathematically, the key step (i.e., step 3 in Figure \ref{fig:conceptgraph}) involves the determination of the cohort $t$-specific allocation rates $\boldsymbol{W}^{(t)} = (w_1^{(t)}, w_2^{(t)}, ..., w_K^{(t)})$, with $$w_k^{(t)} \propto f_k(\hat{p}_1^{(t-1)}, ..., \hat{p}_K^{(t-1)}) \cdot g_k^{(t)}(\hat{p}_1^{(t-1)}, ..., \hat{p}_K^{(t-1)}),$$ where $f_k$ is some pre-specified randomization rule, and $g_k^{(t)}$ is an adaptive pruning factor that can be used to downweight recruitment plans that respond poorly, and $\sum_{k=1}^K w_k^{(t)}=1$.


\RestyleAlgo{ruled}
%% This declares a command \Comment
%% The argument will be surrounded by /* ... */
\SetKwComment{Comment}{/* }{ */}

\begin{algorithm*}[!hbt]
\caption{Procedure for Adaptive Learning Framework}\label{alg:adaptive}
\textbf{Inputs:} initial round $t = 1$, total round $T=T_0$, sample size for round 1 $n^{(1)} = N/T_0$.

 \While{$t \le T$}{
  
  \eIf{t = 1}{
    Randomly assign all patients in cohort 1 according to $w_k^{(1)}=1/K$, where $k=1, ..., K$, and obtain data $D^{(1)}$\;
   }{
   \begin{minipage}{0.85\linewidth}
   \begin{enumerate}[leftmargin=*]
       \item Given the data $D^{(t-1)}$ collected up to round $t-1$, we apply the ensemble model to predict the plan response rates $\hat{\boldsymbol{p}}^{(t-1)}=(\hat{p}_1^{(t-1)}, \hat{p}_2^{(t-1)}, ..., \hat{p}_K^{(t-1)})$;
       \item Calculate the allocation rates $\boldsymbol{W}^{(t)} = (w_1^{(t)}, w_2^{(t)}, ..., w_K^{(t)})$ with
      $$w_k^{(t)} \propto f_k(\hat{p}_1^{(t-1)}, \hat{p}_2^{(t-1)}, ..., \hat{p}_K^{(t-1)}) \cdot g_k^{(t)}(\hat{p}_1^{(t-1)}, \hat{p}_2^{(t-1)}, ..., \hat{p}_K^{(t-1)})$$ based on pre-specified randomization rule $f_k$, pruning factor $g_k^{(t)}$, and $\sum_{k=1}^K w_k^{(t)}=1$;
       \item \textbf{if} $t<T$ \textbf{then} \textbf{if}
       % \hspace*{3mm}
        \begin{enumerate}[leftmargin=+.5in]
          \item $n_{\text{min}}^{(t)}<0$ (the precision of the observed ORR has met the power requirement);\\
          \textbf{or} \item $\exists \ k$, $w_k^{(t)}=1$ (single recruitment plan selected for next cohort sampling);\\ \textbf{or} \item $\widehat{ORR}^{(t)} - \widehat{ORR}^{(t-1)} < \epsilon$ (limited improvement on predicted ORR);
        \end{enumerate}
        \hspace*{4mm} \textbf{then} (Early stopping) %(if any of above 3 conditions meet)
        \begin{itemize}[leftmargin=+.5in]
            \item[] Terminate the adaptive learning  with $T=t$ by combining all the rest samples into a single cohort with sample size $n^{(t)}=N-\sum_{s=1}^{t-1} n^{(s)}$;
        \end{itemize}
          
        \textbf{else} 
        \begin{itemize}[leftmargin=+.2in]
            \item[] Calculate cohort $t$ sample size, $n^{(t)}$ (Section \ref{sec:samplesizecalc}); %Identify the minimum sample size required to reach a certain improvement of the overall response rate;
        \end{itemize}
        
      \item Randomly assign recruitment plans $1, ..., K$ to individuals in cohort $t$ according to $\boldsymbol{W}^{(t)}$ and collect response data, which will be combined with data $D^{(t-1)}$ collected in previous rounds to generate the updated data $D^{(t)}$;
      \item $t=t+1$;
   \end{enumerate}
    \end{minipage}
  }
 }
\KwResult{Participants response data collected up to round $T$, $D^{(T)}$, and overall response rate over $N$ samples, $ORR^{(T)}$.}
\end{algorithm*}

The learning performance may vary from the choices of $f_k$ and $g_k^{(t)}$, and yield different power and false discovery rate control. In the simulation study (Section \ref{sec:simulation}), we \xx test the adaptive learning performance when using \xx the simple rule $f_k(\hat{p}_1^{(t-1)}, ..., \hat{p}_K^{(t-1)})=\hat{p}_k^{(t-1)}$, proportional to the predicted response rate. \xx Additionally, we assign the adaptive pruning factor $g_k^{(t)}$ at value 1 to a cluster of recruitment plans with the highest predicted response rates in round $t$, denoted by $C^{(t)}$.  All the recruitment plan $k \notin C^{(t)}$ are pruned with $g_k^{(t)}=0$. 
By \xx applying the K-means method \citep{Lloyd1982, MacQueen1967}, this selected recruitment plan set $C^{(t)}$ is determined out of the previously selected $C^{(t-1)}$ in round $t-1$ to satisfy the monotonic condition. \xx  The silhouette score \citep{Rousseeuw1987} is used to select the best number of clusters. %We use $g_k^{(t)}$ to label the cluster $C^{(t)}$ with the highest response rates.
\xx If it demonstrates the effectiveness of recruitment in this simple setting, we expect the adaptive learning performance to be further enhanced with tailored ML methods in real data applications.
\xx

We now discuss the theoretical properties in this setting.
\begin{lemma}
For any rule $f_k(p_1, p_2, \cdots, p_K) \propto p_k$, 
$$\sum_k w_k p_k = \sum_k f_kp_k / \sum_k f_k \geq \sum p_k / K.$$
\label{lem:1} %$\widehat{ORR}^{(t)} - \widehat{ORR}^{(t-1)} > 0$
\end{lemma}\vspace{-16pt}
The derivation of Lemma \ref{lem:1} employs the Cauchy–Schwarz inequality for its proof (Supplementary Section \ref{sec:proof_lemma1}), and the equality holds \textit{iff} $p_1= p_2= \cdots= p_K$. Incorporating the law of large numbers leads to the subsequent remark, suggesting that it is always safe to apply a consistent learning strategy in recruitment:
\begin{remark} (\textit{Non-inferiority})
If a learning method yields recruitment response estimators that converge consistently (i.e., $\lim_{n \to \infty} \hat{p}_k = p_k$, for $k=1, \ldots, K$), then a recruitment strategy based on $f_k(\hat{p}_1, \hat{p}_2, \ldots, \hat{p}_K) \propto \hat{p}_k$ will be statistically non-inferior to the conventional strategy that assigns recruitment equally (i.e., $f_k(\hat{p}_1, \hat{p}_2, \ldots, \hat{p}_K) = 1/K$), with probability 1. \label{rm:1}
\end{remark}\vspace{-4pt}

Additionally, we have the following property of adaptive recruitment plan selection, given a pruning factor $g_k^{(t)} \in \{0,1\}$ on $t \in [1,T]$, which satisfies the boundary constraints
\begin{eqnarray*}
    g_k^{(1)} &=&  1, \text{ for } k=1,\ldots, K\\
    \sum_{k=1}^K g_k^{(T)} &\geq&  1,
\end{eqnarray*} 
and\xx, to consistently exclude the less effective recruitment plans among the preceding round, imposing \xx the monotonic condition
\begin{eqnarray*}
       g_k^{(t)} &\leq& g_k^{(t-1)},\\
       \sum_{k=1}^K g_k^{(t)} &<&  \sum_{k=1}^K g_k^{(t-1)},
       \text{ for } 2\leq t\leq T:
\end{eqnarray*} 

\begin{lemma} (\textit{Optimality})
Without loss of generality, we assume that the true recruitment response rate $1 \geq p_1>p_2>\cdots>p_K \geq 0$.  For any consistent rule $f_k(p_1, p_2, \cdots, p_K) \propto p_k$, if combined with a strict pruning factor $g^{(t)}_k(p_1, p_2, \cdots, p_K)$ satisfying
$$ g_k^{(T)}(p_1, p_2, \cdots, p_K) =\begin{cases} 1, & k=\text{argmax}_k p_k = 1\\  0, &  k \neq 1 
       \end{cases},$$
we have 
\begin{eqnarray*}
    \sum_k w_k^{(t)} p_k &=& \sum_k f_k g_k^{(t)} p_k / \sum_k f_k g_k^{(t)} \\
    &\leq& \sum_k f_k g_k^{(T)} p_k / \sum_k f_k g_k^{(T)}= p_1.
\end{eqnarray*}
\label{lem:2} %$\widehat{ORR}^{(t)} - \widehat{ORR}^{(t-1)} > 0$
\end{lemma}\vspace{-16pt}
The proof is \xx included in Supplementary Section \ref{sec:proof_lemma2}\xx. Lemma \ref{lem:2} suggests that, when the recruitment plans can be completely ranked in terms of the recruitment response rates, the optimal response rate will be achieved, at least in the last cohort, by adopting the strict pruning factor to select the most effective recruitment plan. This interesting result can also be easily generalized to any semi-strict pruning factor $g^{(t)}_k(p_1, p_2, \cdots, p_K)$ with
$$g_k^{(T)}(p_1, p_2, \cdots, p_K) =\begin{cases} 1, & k \in \{1,\ldots,K_0\}\\  0, &  k \in \{K_0+1,\ldots, K\}       \end{cases},$$ for the best subset with $K_0$ recruitment plans, to obtain 
\begin{eqnarray*}
    \sum_k w_k^{(t)} p_k &\geq& \sum_k w_k^{(t-1)} p_k, \text{ for  } 2\leq t\leq T \\
  \text{and }  \sum_k w_k^{(T)} p_k &\geq& p_{K_0}.
\end{eqnarray*} 
This result is important because, in practice, we don't expect a large $T$ for many rounds of learning to guarantee the identification of the most effective recruitment plan; nonetheless, it is still promising to adopt an adaptive pruning process to exclude some ineffective allocation plans and improve the recruitment responses. The K-means derived $g_k^{(t)}$ is a typical example.

When the sample size is large enough to conduct consistent response rate estimation, the subsequent proposition holds following the Remark \ref{rm:1}, 
\begin{proposition} (\textit{Superiority})
Jointly with a pruning strategy $g^{(t)}_k(\hat{p}_1, \hat{p}_2, \ldots, \hat{p}_K)$ in patient allocation, the adaptive learning strategy $w_k^{(t)} \propto f_k \cdot g_k^{(t)}$ can consistently improve recruitment efficiency over time, if some $p_k$'s are not equal.
\end{proposition}



\vspace{-5pt}
\subsection{Modeling and design considerations in selecting the recruitment plan(s)}\label{sec:designrules}

Careful considerations should be taken in the above adaptive learning framework. Below we demonstrate a feasible approach and its justifications, specifically regarding the ensemble modeling, total round determination, early termination rules, and sample size calculation for each round.

\vspace{-5pt}
\subsubsection{Ensemble modeling for response rate prediction}\label{sec:ensemble}
\vspace{-5pt}
We illustrate the response prediction model using ensemble learning, which combines the predictions from multiple ML algorithms (as base learners) to make robust predictions \citep{Dietterich2000,Guzman2015,silva2014}. %To predict recruitment responses, our ensemble algorithm incorporates seven mainstream supervised learning ML models as base learners. 
The ensemble model can be fine-tuned using the best-fitting parameters, which are identified through a grid search method coupled with 10-fold cross-validation. The 7 selected ML algorithms for simulation study are categorized into two groups: parametric models (including logistic regression \citep{Cox1958}, lasso regression \citep{Tibshirani1996}, ridge regression \citep{Hoerl2000}, and non-parametric models, such as gradient boosting machine (GBM) \citep{Friedman2001}, random forest (RF) \citep{Ho1995}, Extreme Gradient Boosting (XGBoost) \citep{chen2016}, and artificial neural networks (NNs) \citep{Grossberg1988}. In general, the selection of the base learners can be customized, depending on the study objective, for different datasets.

\iffalse
\subsubsection{Allocation rate calculation}\label{sec:allocationrules}
The allocation rate is determined by $$w_k^{(t)} \propto f_k(\hat{p}_1^{(t-1)}, ..., \hat{p}_K^{(t-1)}) \cdot g_k^{(t)}(\hat{p}_1^{(t-1)}, ..., \hat{p}_K^{(t-1)}),$$ where $f_k$ is some pre-specified randomization rule, and $g_k^{(t)}$ is an adaptive pruning factor that can be used to downweight recruitment plans that respond poorly, and $\sum_{k=1}^K w_k^{(t)}=1$. The learning efficiency may vary from the choices of $f_k$ and $g_k^{(t)}$, and yield different power and false discovery control. In the simulation study (\ref{sec:simulation}), we use the simple rule $f_k(\hat{p}_1^{(t)}, ..., \hat{p}_K^{(t)})=\hat{p}_k^{(t)}$, proportional to the predicted response rate. For the adaptive pruning factor $g_k^{(t)}$, we apply the K-means method \citep{Lloyd1982, MacQueen1967} to cluster on the predicted response rates and use $g_k^{(t)}$ to label the cluster $C^{(t)}$ with the highest response rates.  Specifically, all the recruitment plan $k \notin C^{(t)}$ are pruned with $g_k^{(t)}=0$. 
%After conducting these explorations and evaluations, we will develop an optimization method for sample size calculation based on the recommended strategy. The sample size calculation, and its equivalent power calculation, will be critical for trial recruitment planning. Notably, the sample size here is not the sample size of study participants, but the sample size of potential participants to be invited for pragmatic trials (e.g., the total number of recruitment letters).  
\fi

\vspace{-5pt}
\subsubsection{Sample size calculation}\label{sec:samplesizecalc}
\vspace{-5pt}
When the total sample size is limited, we can determine the minimum sample size required for round $t$ by considering the observed response rate for round $t-1$, denoted as $\hat{p}^{(t-1)}$, and an arbitrary expected effect size improvement $\Delta$. Assuming a target power of $1-\beta^{(t)}$, where $\beta^{(t)}$ represents the Type II error rate at round $t$, we conduct hypothesis testing with the following hypotheses:
\begin{align*}
H_0 &: p^{(t)} - p^{(t-1)} = 0 \\
H_1 &: p^{(t)} - p^{(t-1)} > 0
\end{align*}
The minimum sample size required to reject the null hypothesis at round t, denoted as $n_{min}^{(t)}$, is:

$$
n_{min}^{(t)} = \frac{\hat{p}^{(t)}(1-\hat{p}^{(t)})}{\frac{\Delta^2}{(Z_{1-\alpha}+Z_{1-\beta^{(t)}})^2} - \frac{\hat{p}^{(t-1)}(1-\hat{p}^{(t-1)})}{n^{(t-1)}}}
$$
Here, $n^{(t-1)}$ represents the observed sample size for round $t-1$, and $\alpha$ denotes the Type I error rate. $\hat{p}^{(t)}$ is calculated as $\hat{p}^{(t)} = \hat{p}^{(t-1)} + \Delta$. $Z_{1-\alpha}$ and $Z_{1-\beta^{(t)}}$ are critical values from the standard normal distribution.
Additionally, the minimum sample size is constrained by the total sample size divided by the number of rounds, i.e., $N/T$. Hence, the final minimum sample size is determined as $n^{(t)} = \min\{n_{\text{min}}^{(t)}, N/T\}$.

% The sample size calculation determined that the precision of the observed Objective Response Rate (ORR) meets the pre-specified improvement of the effect size. This indicates that the study was adequately powered to detect the desired effect size, ensuring that any observed differences in ORR are statistically meaningful and reliable.
\vspace{-5pt}
\subsubsection{Early termination}\label{sec:earlyterm}\vspace{-5pt}
The adaptive learning procedure may stop early under three conditions that no significant improvement in recruitment allocation can be further made through adaptive learning. Firstly, if the precision of the observed ORR has met the power requirement, indicated by $n_{\text{min}}^{(t)}<0$.
 Secondly, if only one recruitment plan remains in $D^{(t-1)}$. Thirdly, the process halts early when the following condition is met:
$$
    \widehat{ORR}^{(t)} - \widehat{ORR}^{(t-1)} < \epsilon 
$$
where $\widehat{ORR}^{(t)}$ represents the predicted ORR across all recruitment plans using data $D^{(t-1)}$. %$\widehat{ORR}^{(t-1)} = \sum_{k=1}^{K} \hat{p}_k^{(t)}w_k^{(t-1)}, \widehat{ORR}^{(t)} = \sum_{k=1}^{K} \hat{p}_k^{(t)}w_k^{(t)}$. 
According to the adaptive learning framework (Section \ref{sec:procedure}), recruitment plans with higher response rates are prioritized and assigned greater weight for subsequent rounds of the trial. Based on the definition of $\widehat{ORR}^{(t-1)}$ and $\widehat{ORR}^{(t)}$, we can readily establish the following proposition:
\begin{proposition}
$\widehat{ORR}^{(t)} - \widehat{ORR}^{(t-1)} > 0$
\end{proposition}
The third stopping criterion signifies that the updated recruitment plans at round $t$ yield only marginal enhancements to the overall response rate. When any of the three conditions is met, the adaptive learning terminates at $T=t$ by consolidating all remaining samples into a single cohort with an allocation probability of $W^{(T)}=W^{(t)}$. 

\vspace{-5pt}
\subsubsection{Total round determination}\vspace{-5pt}
For adaptive learning, it is also important to specify the total round $T$, which could be pre-fixed or considered random. From a cost-effective perspective, the total round $T$ is constrained by the study recruitment duration ($Tot_t$) and total cost ($Tot_c$). We denote $C_1$ the fixed cost for each round of recruitment, $C_2$ the cost for each letter or sample size, and $Time_r$ the projected duration for each round. With $N$ potential participants to be reached out in total $T$ rounds, the estimated total cost and time duration for the recruitment will be:
\begin{align*} 
    \widehat{Cost} = T\times C_1 + N\times C_2 \\
    \widehat{Time} = T\times Time_r
\end{align*}
Therefore, the total round $T$ is determined by the two conditions: $\widehat{Cost}\le Tot_c$ and $\widehat{Time}\le Tot_t$. For feasibility, we fixed the total round to be 6 in the simulation (Section \ref{sec:simulation}).

\xx
In practice, researchers should begin with a conservative estimate of the time frame required for each round of patient recruitment and response collection, denoted as $Time_r$ (e.g., 2 months). Given the total scheduled duration for the recruitment phase, $Tot_t$ (e.g., 1 year), the maximum number of rounds $T_{max}$ can be determined as $T_{max} = Tot_t / Time_r$. The total recruitment cost should then be evaluated for $T_{max}$ rounds to ensure it meets the cost constraints. If the cost exceeds the available resources, the number of rounds $T$ can be reduced below $T_{max}$ accordingly.

To assess the potential impact of varying $T$, simulation studies will be conducted as sensitivity analyses, examining the effects of increasing or decreasing the number of rounds. Generally, we recommend $T=4$ to $8$ rounds for effective adaptive learning, as this range tends to strike a balance between computational efficiency and the ability to learn and adapt over multiple iterations. However, the final selection of $T$ should be made jointly with the study's principal investigator (PI), considering the overall recruitment strategy and resource limitations.

\begin{table*}[!b]
  \caption{Simulation results for Scenario 1 with 5 design features. \xx Values for ORRs, plan numbers, expected rounds, and sample size for last round are mean with standard deviation in the parenthesis. \xx}
  \label{tbl:s1_5designs}
  \includegraphics[width=\linewidth]{scenario1_5d.png}
\end{table*}

\xx
\vspace{-5pt}
\subsubsection{Total recruitment plan determination}\label{sec:totalplannumber}\vspace{-5pt}
Assume that we have determined the total round $T$, an estimated response rate ($\theta_0$), and an estimated minimum number of responders for each recruitment plan ($S$), this leads us to the inequality that $K*T*S < N*\theta_0$, where $K$ signifies the total number of recruitment plans and $N$ represents the total sample size. The inequality suggests that $K$ is subject to the constraint:
$$
K < \frac{N*\theta_0}{T*S} 
$$
This constraint ensures the adequacy of data for our machine learning model to effectively discern the impact of different recruitment plans. By adhering to this constraint, we guarantee the availability of sufficient data points necessary for accurate analysis and interpretation of the effects of the recruitment strategies.



\section{Simulation}\label{sec:simulation}

We conduct simulation studies and design three scenarios to examine the efficiency of adaptive ensemble learning for participant recruitment in the context of the SilverSneakers program (Section \ref{sec:silversneaker}). Details on the simulation settings are described in the Supplementary Section \ref{sec:appendixsimsetting}.

% overall performance:
The simulation results (Tables \ref{tbl:s1_5designs} and Tables \ref{tbl:s1_8designs} - \ref{tbl:s2b_8designs} in the Supplementary Material) highlight the progressive nature of adaptive learning, wherein recruitment plans with superior response rates are increasingly favored over successive rounds. The ORRs of the last round closely approximate the highest true response rate (RR) and the ORR converges notably to the highest true RR starting after round 2 (e.g., Table \ref{tbl:s1_5designs}(A)). The number of remaining recruitment plans in the final round is around 2 (e.g., Table \ref{tbl:s1_5designs}(B)), which demonstrates that the adaptive learning framework discards ineffective recruitment plans with zero weight at the first two rounds.

Notably, the early stopping rate is high (e.g., Table \ref{tbl:s1_5designs}(C)) and mostly is due to the single recruitment plan selected for the next cohort sampling (Algorithm \ref{alg:adaptive}). The sample size used for the last round for Scenarios 1 and 2 is at least 110,000, which is at least 62.8\% of the total 175,000 sample size. The adaptive learning approach achieves its objectives using less than half of the available data, underscoring that the approach is efficient in identifying and selecting the most promising recruitment plan.

The true response rate assignment in Scenario 1 does not favor any design features. As depicted in Tables \ref{tbl:s1_5designs}(A) and \ref{tbl:s1_8designs}(A)), while tree-based methods and the two ensemble learning approaches manage to attain the highest true response rate (0.097, (0.003)), logistic ridge regression falls short with an overall ORR of 0.080 (0.010) and adaptive learning ORR of 0.087 (0.012). Despite the inclusion of logistic ridge regression within the ensemble learning methods, the robustness inherent in ensemble learning enables them to maintain performance levels on par with tree-based methods, random forest, and XGBoost. This signifies the efficacy of ensemble learning in mitigating the risk associated with selecting less effective machine learning methods. 

Scenario 2(a) constructs an additive setting for design features, which favors the logistic regression. Consequently, we observe superior performance of logistic ridge regression compared to random forest and XGBoost, as delineated in Tables \ref{tbl:s2a_5designs} and \ref{tbl:s2a_8designs}. Conversely, Scenario 2(b) adds an interaction term for design features, facilitating tree-based methods to outperform logistic ridge regression, as evident in Tables \ref{tbl:s2b_5designs} and \ref{tbl:s2b_8designs}. Although these two settings are designed to favor different base learners, the ensemble learning methods can still take advantage of containing at least one of the favored learners to perform comparably well. Moreover, comparing ensemble methods with 3 base learners to 7 base learners reveals that incorporating more methods does not compromise overall performance and can enhance it in certain instances. For instance (Table \ref{tbl:s2a_8designs}(A)), the ensemble learning with 7 learners outperforms other methods, including the ensemble learning with 3 learners from round 2, while the latter catches up with the performance of the 7-learner ensemble learning from round 3. This observation underscores the robustness of ensemble learning across diverse conditions and further highlights its adaptability and effectiveness in various scenarios.

We also conducted a comparison between the proposed adaptive learning framework and the benchmark in three scenarios. Since there is no recruitment plan selection and allocation prediction, the ORR of the benchmark is 0.055 all the time. However, the examination of the results tables reveals that the overall ORRs of all candidate methods in Scenarios 1 and 2 closely approximate the highest true response rates. Specifically, these rates are 0.097 and 0.1 for Scenario 1 with 5 and 8 design features, and 0.068 for Scenario 2. The chances of better performance against the benchmark in Table (C) of all results tables are almost 100\% of the time for the ensemble learning methods. In Scenario 3, due to the absence of recruitment plan selection and plan allocation rate predictions, both the benchmark and the adaptive learning approach maintain a steady ORR of 0.055 throughout the entire process, indicating a non-inferior performance of the adaptive learning approach. These findings indicate that the adaptive learning framework exhibits performance comparable to, or notably superior to, the random approach (benchmark). This suggests the robustness and effectiveness of the adaptive learning approach in optimizing recruitment plans under various conditions, thus affirming its suitability for practical implementation.

In summary, we designed three scenarios to demonstrate the robustness of the adaptive learning framework with the ensemble learning method. Compared with the random approach, the adaptive learning framework can effectively select the most effective recruitment plan in a fast and efficient manner. The incorporation of ensemble learning into allocation prediction mitigates the risk of choosing undesirable machine learning methods, ensuring consistent and robust performance across diverse scenarios. 
% This approach not only expedites the recruitment process but also provides assurance of high performance under varying clinical conditions.

\xx
\section{Extensions}
\vspace{-3pt}
%The proposed approach can be extended in the following directions:
Below we highlight some potential extensions of the illustrated method.

\ul{Refining pruning strategy:} In pursuit of a more streamlined and effective recruitment plan selection process, K-means could be replaced with more effective approaches, such as X-means \citep{xmeans}. This transition enables dynamic selection in the optimal number of clusters, overcoming the limitations associated with K-means. Additionally, X-means enhances computational efficiency, making it a more robust choice in applications.

\ul{Global optimization in response estimation:} The proposed approach focuses on maximizing predicted plan response rates. Alternative methods, such as the multi-armed bandit approach \citep{pmlr-v23-agrawal12,MNLBANDIT}, which simultaneously balances exploration with exploitation, may lead to a more efficient adaptive recruitment strategy by globally maximizing the total predictive reward.

\ul{Enhancing recruitment for underrepresented patients:} A group-specific recruitment allocation plan can be implemented to mitigate health disparities and enhance the recruitment of underrepresented populations. This extension involves clustering patients based on their demographic and clinical characteristics and optimizing recruitment strategies tailored to each group.

\ul{Incorporating external evidence:} When relevant external data or experts' opinions are available, they can be converted as the prior distributions and the allocation optimization could also be conducted using the Bayes learning \citep{Gelman2013BayesianAnalysis}. Furthermore, in clinically diverse settings with multiple patient groups, it is compelling to employ learning methods that can incorporate both plan features and patient characteristics. Potentially, it will lead to better identification of optimal recruitment plans for specific patient subgroups, opening a door to diverse efficiency gains.

\ul{Sequantial assignment for non-responders:} In the context of rare diseases where the pool of eligible patients is limited, the proposed approach can be adapted to re-assign non-responders to receive additional recruitment plans. This extension tests ML-guided recruitment strategies in sequential order for patients who do not respond, maximizing the chances of successful enrollment in the study.



%\vspace{-5pt}
\xx

%\vspace{-15 pt}
\section{Conclusion}
\vspace{-2pt}
\xx
Patient recruitment remains a critical challenge in large-scale pragmatic clinical trials, necessitating extensive sample sizes and diverse patient populations. Traditional strategies often struggle to meet demanding recruitment requirements across varied clinical settings. We proposed a novel adaptive learning framework integrating ensemble learning to iteratively optimize patient recruitment. Through simulations, we demonstrated the proposed framework could efficiently identify and prioritize the most effective recruitment plans while mitigating the risk of selecting suboptimal recruitment plans. This work establishes a foundation for leveraging AI/ML to address longstanding recruitment challenges, facilitating more efficient pragmatic trials by substantially improving recruitment rates and accelerating clinical research.
\xx




\begin{contributions} % will be removed in pdf for initial submission 
					  % (without ‘accepted’ option in \documentclass)
                      % so you can already fill it to test with the
                      % ‘accepted’ class option
List of Authors:     Xinying~Fang (X.F.), Shouhao~Zhou (S.Z.)

S.Z. conceived the original idea. X.F. created the code, ran the experiments, and generated the figures, with regular feedback from S.Z. X.F. and S.Z. wrote the paper.
\end{contributions}

\begin{acknowledgements} % will be removed in pdf for initial submission,
						 % (without ‘accepted’ option in \documentclass)
                         % so you can already fill it to test with the
                         % ‘accepted’ class option
    We would like to thank the editor and five anonymous reviewers for their insightful comments. We are grateful to Drs. Liza Rovniak and Christopher Sciamanna for discussion of the illustrative study.

%    \emph{All} acknowledgements go in this section.
\end{acknowledgements}

% References
\bibliography{uai2024-template}

\clearpage

\onecolumn

\title{Supplementary Material}
\maketitle

\appendix





\xx 
% \vspace{-120pt}
\section{Proofs}

\subsection{Proof of Lemma 1} \label{sec:proof_lemma1}
According to the Cauchy-Schwarz inequality, we can get
$$(\sum_k f_k * 1)^2 \leq K\sum_k f_k^2$$
The inequality is rearranged to
$$\sum_k f_k^2 / \sum_k f_k \geq \sum_k f_k /K$$
Thus, with $f_k(p_1, p_2, \cdots, p_K) \propto p_k$, we can get
$$\sum_k f_kp_k / \sum_k f_k \geq \sum p_k / K.$$



\subsection{Proof of Lemma 2} \label{sec:proof_lemma2}

For $1 \geq p_1 \geq p_2 \geq \cdots>p_K \geq 0$, we have
\begin{eqnarray*}
    \sum_k w_k^{(t)} p_k &=& \sum_k f_k g_k^{(t)} p_k / \sum_k f_k g_k^{(t)} \\
    &\leq& \sum_k f_k g_k^{(t)} p_1 / \sum_k f_k g_k^{(t)} = p_1
\end{eqnarray*}
Also, we know that $g_k^{(T)} = 1$ if $k = 1$, so 
\begin{eqnarray*}
    \sum_k f_k g_k^{(T)} p_k / \sum_k f_k g_k^{(T)}= f_1 p_1 / f_1 = p_1.
\end{eqnarray*}
Therefore, we can conclude that
\begin{eqnarray*}
    \sum_k w_k^{(t)} p_k &=& \sum_k f_k g_k^{(t)} p_k / \sum_k f_k g_k^{(t)} \\
    &\leq& \sum_k f_k g_k^{(T)} p_k / \sum_k f_k g_k^{(T)}= p_1.
\end{eqnarray*}

\xx
\newpage
\section{Simulation setting}\label{sec:appendixsimsetting}
For an illustrative purpose, here we simplify the setting by disregarding participant-specific characteristics and test on $5$ and $8$ binary design features, each leading to $2^5=32$ and $2^8=256$ candidate recruitment plans, respectively. We consider five methods for comparison, including logistic regression with l2 penalty, random forest, XGBoost, ensemble learning with these three methods, and ensemble learning with the seven methods mentioned in Section \ref{sec:ensemble}.

The true underlying response rate $p_k$ of each recruitment plan is defined using the following three scenarios: \vspace{-5pt}
\begin{enumerate}
    \item The true response rates for each recruitment plan are randomly assigned within $[0.01, 0.1]$.
    \item Logistic regression scenario:
    \begin{enumerate}
        \item Assign fixed coefficients to recruitment plans. Let $\beta_1$ for design feature 1 be 0.5, $\beta_2$ for design feature 2 be -0.5, and coefficients for all other designs be 0. Then, the response rates for each recruitment plan is
        $inv.logit(\beta_1 x_1+\beta_2 x_2)*0.11$, where $x_1$ and $x_2$ are binary indicators for design features 1 and 2. The multiplying factor $0.11$ is applied to maintain an expected overall recruitment response rate to be 0.055. 
        \item Assign fixed coefficients along with an interaction. We assign $\beta_1=-0.5$ for design feature 1 and $\beta_{12}=1$ for the interaction between design features 1 and 2. The coefficients for all other designs are 0. Then, the response rates for each recruitment plan is $inv.logit(\beta_1 x_1+\beta_{12} x_1 x_2)*0.11$. The multiplying factor $0.11$ is applied to maintain an expected overall recruitment response rate to be 0.055.
    \end{enumerate}
    \item The true response rates for each recruitment plan are equal to 0.055.
\end{enumerate} \vspace{-5pt}
While all scenarios have the same expected overall recruitment response rates of 0.055, Scenario 3 is the worst-case scenario for learning when no improvement could be made. Nevertheless, we include it to examine the non-inferiority of the proposed learning procedure.

We limit the total sample size (letters) to 175,000 and the total rounds of the experiment, $T$, are 5 and 6 for 5 and 8 design features, respectively. Thus, approximately $175000/5 \approx 35000$ and $175000/6 \approx 29167$ participants will be reached out as cohort 1 at round 1. The remaining samples will be allocated across subsequent rounds based on the sample size calculation (Section \ref{sec:samplesizecalc}) and the early termination rule (Section \ref{sec:earlyterm}). We repeat the data-generating process 100 times, to calculate the ORR within each cohort and over the whole sample. As a benchmark, we employ a random approach where recruitment plans are randomly assigned to participants at each round. To evaluate the performance of the adaptive learning framework against this benchmark, we conduct binomial hypothesis testing at each replication to determine the overall chances of better performance against the benchmark by the adaptive learning approach.  \xx The adaptive learning procedure for the simulation study %w.r.t. the calculation of the allocation rates 
are illustrated in Algorithm \ref{alg:simulation}.\xx

\RestyleAlgo{ruled}
%% This declares a command \Comment
%% The argument will be surrounded by /* ... */
\SetKwComment{Comment}{/* }{ */}

\begin{algorithm*}[!hbt]
\caption{Adaptive learning procedure for the simulation study}\label{alg:simulation}
\textbf{Inputs:} initial round $t = 1$, total round $T=T_0$, sample size for round 1 $n^{(1)} = N/T_0$.

 \While{$t \le T$}{
  
  \eIf{t = 1}{
    Randomly assign all patients in cohort 1 according to $w_k^{(1)}=1/K$, where $k=1, ..., K$, and obtain data $D^{(1)}$\;
   }{
   \begin{minipage}{0.85\linewidth}
   \begin{enumerate}[leftmargin=*]
       \item Given the data $D^{(t-1)}$ collected up to round $t-1$, 
        \begin{enumerate}[leftmargin=+.5in]
          \item Apply a learning model (e.g., an ensemble model or a base learner) to predict the plan response rates $\hat{p}_k^{(t-1)}$ among the recruitment plans $k\in C^{(t-1)}$ (i.e., the set of recruitment plans with the adaptive pruning factor $g^{(t-1)}_k=1$);
          \item Perform K-means clustering on the predicted plan response rates $\{\hat{p}_k^{(t-1)}\}$, $k\in C^{(t-1)}$;
          \item Assign (keep) $g^{(t)}_k=1$ to the recruitment plans in the best-performed cluster, denoted by $C^{(t)}$. All the other recruitment plans $k \notin C^{(t)}$ are pruned with $g^{(t)}_k = 0$;
        \end{enumerate}
       \item Calculate the allocation rates $\boldsymbol{W}^{(t)} = (w_1^{(t)}, w_2^{(t)}, ..., w_K^{(t)})$ with
      $$w_k^{(t)} = \frac{\hat{p}_k^{(t-1)} \cdot g_k^{(t)} }{\sum_k \hat{p}_k^{t-1} \cdot g_k^{(t)} }$$
       \item \textbf{if} $t<T$ \textbf{then} \textbf{if}
       % \hspace*{3mm}
        \begin{enumerate}[leftmargin=+.5in]
          \item $n_{\text{min}}^{(t)}<0$ (the precision of the observed ORR has met the power requirement);\\
          \textbf{or} \item $\exists \ k$, $w_k^{(t)}=1$ (single recruitment plan selected for next cohort sampling);\\ \textbf{or} \item $\widehat{ORR}^{(t)} - \widehat{ORR}^{(t-1)} < \epsilon$ (limited improvement on predicted ORR);
        \end{enumerate}
        \hspace*{4mm} \textbf{then} (Early stopping) %(if any of above 3 conditions meet)
        \begin{itemize}[leftmargin=+.5in]
            \item[] Terminate the adaptive learning  with $T=t$ by combining all the rest samples into a single cohort with sample size $n^{(t)}=N-\sum_{s=1}^{t-1} n^{(s)}$;
        \end{itemize}
          
        \textbf{else} 
        \begin{itemize}[leftmargin=+.2in]
            \item[] Calculate cohort $t$ sample size, $n^{(t)}$ (Section \ref{sec:samplesizecalc}); %Identify the minimum sample size required to reach a certain improvement of the overall response rate;
        \end{itemize}
        
      \item Randomly assign recruitment plans $1, ..., K$ to individuals in cohort $t$ according to $\boldsymbol{W}^{(t)}$ and collect response data, which will be combined with data $D^{(t-1)}$ collected in previous rounds to generate the updated data $D^{(t)}$;
      \item $t=t+1$;
   \end{enumerate}
    \end{minipage}
  }
 }
\KwResult{Participants response data collected up to round $T$, $D^{(T)}$, and overall response rate over $N$ samples, $ORR^{(T)}$.}
\end{algorithm*}


\clearpage
\section{Additional simulation results}

\begin{table*}[!htb]
  \caption{Simulation results for Scenario 1 with 8 design features. Values for ORRs, plan numbers, expected rounds, and sample size for last round are mean with standard deviation in the parenthesis.}
  \label{tbl:s1_8designs}
  \includegraphics[width=\linewidth]{scenario1_8d.png}
\end{table*}

\begin{table*}[!htb]
  \caption{Simulation results for Scenario 2(a) with 5 design features.}
  \label{tbl:s2a_5designs}
  \includegraphics[width=\linewidth]{scenario2a_5d.png}
\end{table*}

\begin{table*}[!htb]
  \caption{Simulation results for Scenario 2(a) with 8 design features.}
  \label{tbl:s2a_8designs}
  \includegraphics[width=\linewidth]{scenario2a_8d.png}
\end{table*}

\begin{table*}[!htb]
  \caption{Simulation results for Scenario 2(b) with 5 design features.}
  \label{tbl:s2b_5designs}
  \includegraphics[width=\linewidth]{scenario2b_5d.png}
\end{table*}

\begin{table*}[!htb]
  \caption{Simulation results for Scenario 2(b) with 8 design features.}
  \label{tbl:s2b_8designs}
  \includegraphics[width=\linewidth]{scenario2b_8d.png}
\end{table*}




\end{document}
