\clearpage

\appendix

\renewcommand \thepart{}
\renewcommand \partname{}

\part{Appendix} % Start the appendix part
\setcounter{secnumdepth}{4}
\setcounter{tocdepth}{4}
% \parttoc % Insert the appendix TOC

\section{Implementation Details}
\label{appendix:implementation}

We provide implementation details for reproducibility. All experiments are conducted on NVIDIA Ampere GPUs.

\smallskip\noindent
\textbf{Baseline Fusion Models.} We train 15 fusion classifiers using AdamW optimizer (lr=$1 \times 10^{-3}$, weight decay=$1 \times 10^{-4}$), batch size 32, and early stopping with patience of 15 epochs. Each modality encoder projects input features to a 32-dimensional representation. These encoded features are then fused, with the resulting dimension depending on fusion type: concat yields $32n$, mean yields 32 (averaged encoder outputs), and tensor yields $(16+1)^{n}$ (Kronecker product with bias term), where $n$ is the number of modalities. All classifiers share a 2-layer MLP structure: Linear(fused\_dim$\to$32) $\to$ ReLU $\to$ Dropout(0.3) $\to$ Linear(32$\to$2).

\smallskip\noindent
\textbf{AdaFuse.} We first pre-train baseline classifiers, then train the policy network while keeping classifiers frozen. We use AdamW with separate learning rates for policy ($3 \times 10^{-4}$) and encoders ($1 \times 10^{-5}$). Temperature is annealed linearly from $\tau_{\text{init}}=1.5$ to $\tau_{\text{final}}=0.3$ over 100 epochs; inference uses greedy decoding ($\tau \to 0$). Loss weights are $\lambda_{\text{ent}}=0.1$ and $\lambda_{\text{sup}}=0.3$. Reward weight is $\lambda_{\text{auc}}=0.3$. We use balanced sampling with approximately 30\% positive samples per batch for stable AUC reward computation.

\smallskip\noindent
\textbf{MoE Baseline.} The gating network is a 2-layer MLP (96$\to$64$\to$64$\to$15) that outputs soft weights over 15 frozen expert classifiers.

\clearpage

\section{Analysis of Learned Policy Behavior}
\label{sec:appendix_behavior}
\begin{figure}[h]
\centering
\includegraphics[width=\linewidth]{img/fig_appendix.png}
\caption{\textbf{AdaFuse policy selection distribution and modality skip rates on the test set.} Left: Distribution of modality-fusion combinations selected by the learned policy, where $N$ denotes the number of patients. The policy most frequently selects ABC-concat ($N=288$, 62.3\%), followed by AB-tensor ($N=54$, 11.7\%) and clinical-only ($N=37$, 8.0\%). Right: Frequency of skipping each modality. The text modality (C) is skipped for 143 patients (31.0\%), while CT (A) and clinical variables (B) are each skipped for only 12 patients (2.6\%). This confirms that the policy learns to filter out the less informative text modality while consistently relying on imaging and clinical data.}
\label{fig:policy_analysis}
\end{figure}


\clearpage

\section{Statistical Analysis of AUC Comparisons}
\label{appendix:statistical}

\begin{table}[h]
\centering
\scriptsize
\caption{\textbf{Bootstrap confidence intervals and significance tests.} We report 95\% confidence intervals from bootstrap analysis (1000 iterations) and p-values from DeLong's test comparing each method against AdaFuse. The wide confidence intervals ($\sim$0.20) across all methods reflect the limited number of positive cases (n=28) in the NLST test set, which is a shared constraint affecting all methods rather than a limitation specific to AdaFuse. Despite overlapping intervals among top-performing methods, AdaFuse achieves the highest point estimate (0.762) with the narrowest confidence interval (0.203) among competitive methods, suggesting that adaptive selection reduces prediction variance. AdaFuse significantly outperforms the text-only baseline (p=0.019).}
\label{tab:statistical}
\begin{tabular}{p{0.22\linewidth} P{0.12\linewidth} P{0.22\linewidth} P{0.22\linewidth}}
\toprule
Method & AUC & 95\% CI & p-value vs AdaFuse \\
\midrule
\textbf{AdaFuse (Ours)} & \textbf{0.762} & [0.657, 0.860] & — \\
$ABC$-tensor & 0.759 & [0.646, 0.863] & 0.898 \\
$AB$-concat & 0.758 & [0.643, 0.867] & 0.901 \\
$AB$-mean & 0.755 & [0.640, 0.861] & 0.847 \\
DynMM & 0.754 & [0.640, 0.855] & 0.829 \\
$ABC$-mean & 0.748 & [0.631, 0.853] & 0.747 \\
$AC$-mean & 0.745 & [0.628, 0.849] & 0.399 \\
MoE & 0.742 & [0.628, 0.847] & 0.666 \\
$AC$-tensor & 0.739 & [0.618, 0.847] & 0.250 \\
$ABC$-concat & 0.735 & [0.622, 0.845] & 0.615 \\
$AB$-tensor & 0.735 & [0.617, 0.848] & 0.552 \\
$AC$-concat & 0.733 & [0.610, 0.843] & 0.148 \\
$A$ (CT) & 0.732 & [0.609, 0.842] & 0.116 \\
$BC$-tensor & 0.685 & [0.574, 0.792] & 0.191 \\
$BC$-mean & 0.678 & [0.566, 0.785] & 0.162 \\
$B$ (Clinical) & 0.662 & [0.544, 0.776] & 0.114 \\
$BC$-concat & 0.661 & [0.544, 0.771] & 0.107 \\
$C$ (Text) & 0.576 & [0.489, 0.657] & 0.019 \\
\bottomrule
\end{tabular}
\end{table}

\clearpage

\section{Data Examples}
\label{appendix:data}

We provide two representative patient examples from the NLST dataset to illustrate the three modalities used in AdaFuse.

\begin{table}[h]
\centering
\scriptsize
\caption{\textbf{Summary of modalities and feature extraction.}}
\label{tab:modality_overview}

\begin{tabular}{p{0.15\linewidth} p{0.25\linewidth} p{0.25\linewidth} p{0.15\linewidth}}
\toprule
Modality & Source & Feature Extractor & Dimension \\
\midrule
A: CT Image & Low-dose chest CT & Sybil (pretrained) & 512D \\
B: Clinical & Structured demographics & PLCO2012 transform & 17D \\
C: Text & Generated risk report & CORe (pretrained) & 768D \\
\bottomrule
\end{tabular}

\end{table}

\smallskip\noindent
\textbf{Patient Case 1: Lung Cancer Positive.} 68-year-old male, BMI 27.46, Asian, diagnosed with lung cancer.

\begin{table}[h]
\centering
\scriptsize
\caption{\textbf{Clinical variables (Modality B) for Patient Case 1.} Raw values are transformed following the PLCO2012 model.}
\label{tab:case1_clinical}

\begin{tabular}{p{0.18\linewidth} p{0.32\linewidth} p{0.18\linewidth} p{0.18\linewidth}}
\toprule
Variable & Description & Raw Value & Transformed \\
\midrule
age & Age at screening & 68 & 6.0 \\
race & Race (one-hot, 7 dims) & Asian & [0,0,0,0,1,0,0] \\
education & Education level & 3 & -1.0 \\
bmi & Body mass index & 27.46 & 0.46 \\
copd & COPD diagnosis & 0 & 0.0 \\
phist & Personal cancer history & 0 & 0.0 \\
fhist & Family lung cancer history & 0 & 0.0 \\
smo\_status & Smoking status & Current & 0.0 \\
smo\_intensity & Cigarettes per day & 30 & -0.069 \\
smo\_duration & Years smoked & 58 & 31.0 \\
quit\_time & Years since quitting & 0 & -10.0 \\
\bottomrule
\end{tabular}

\end{table}

\smallskip\noindent
\textbf{Generated Text Report (Modality C):} ``The patient reports no significant occupational exposures. No significant chronic medical conditions reported. The patient is exposed to secondhand smoke at home and secondhand smoke at workplace.''
\clearpage
\smallskip\noindent
\textbf{Patient Case 2: Lung Cancer Negative.} 65-year-old male, BMI 34.67, White, no lung cancer.

\begin{table}[h]
\centering
\scriptsize
\caption{\textbf{Clinical variables (Modality B) for Patient Case 2.} Raw values are transformed following the PLCO2012 model.}
\label{tab:case2_clinical}

\begin{tabular}{p{0.18\linewidth} p{0.32\linewidth} p{0.18\linewidth} p{0.18\linewidth}}
\toprule
Variable & Description & Raw Value & Transformed \\
\midrule
age & Age at screening & 65 & 3.0 \\
race & Race (one-hot, 7 dims) & White & [0,1,0,0,0,0,0] \\
education & Education level & 3 & -1.0 \\
bmi & Body mass index & 34.67 & 7.67 \\
copd & COPD diagnosis & 0 & 0.0 \\
phist & Personal cancer history & 0 & 0.0 \\
fhist & Family lung cancer history & 0 & 0.0 \\
smo\_status & Smoking status & Former & -1.0 \\
smo\_intensity & Cigarettes per day & 40 & -0.152 \\
smo\_duration & Years smoked & 41 & 14.0 \\
quit\_time & Years since quitting & 10 & 0.0 \\
\bottomrule
\end{tabular}

\end{table}

\smallskip\noindent
\textbf{Generated Text Report (Modality C):} ``The patient has occupational exposure to asbestos and agricultural dusts. Medical history is significant for pneumonia. The patient is exposed to secondhand smoke at home and secondhand smoke at workplace.''

\smallskip\noindent
\textbf{Text Generation Process.} The synthetic text report is generated from 13 binary variables not included in Modality B: 6 occupational exposures (asbestos, chemicals, coal dust, agricultural dusts, firefighting smoke, welding fumes), 5 medical diagnoses (diabetes, heart disease, hypertension, pneumonia, stroke), and 2 environmental smoke exposures (home, workplace). This ensures that Modality C provides information complementary to Modality B rather than redundant.