\section{Click prompting and robustness to annotation variability}
\label{sec:appendix_click_protocol}

\subsection{Click prompting protocol}
\label{subsec:appendix_click_protocol}
To enable a controlled SAM~2 vs.\ SAM~3 comparison under click prompting, we generate prompts in a model-independent manner and reuse identical prompt coordinates for both models. We therefore avoid oracle-style interactive protocols that place later clicks based on model-specific error regions, which would yield different click sequences across models. Instead, we define all clicks solely from the ground-truth mask on the initialization frame and then propagate without further interaction.

For each target object, we define an initialization frame $t_0$ as the first frame in which the ground-truth mask is non-empty. Given the binary ground-truth mask $M_{gt}^{(t_0)}$ on the initialization frame, we place the initial positive click at the most interior point of the object following the SAM~2 and SAM~3 prompt protocol, defined as the pixel attaining the maximum of the Euclidean distance transform of $M_{gt}^{(t_0)}$ (i.e., the foreground pixel farthest from the mask boundary). This yields a deterministic positive click location per object. 

For the (1,2) setting, we add two negative clicks sampled from a local neighborhood around the object. We define a neighborhood ring as the set difference between a dilated mask and the original foreground mask, i.e., $\mathcal{N}=\text{dilate}(M_{gt}^{(t_0)})\setminus M_{gt}^{(t_0)}$ with a fixed dilation radius. The first negative click is selected as the point in $\mathcal{N}$ farthest from the object centroid, and the second negative click is selected via a maximin criterion to be as far as possible from the first negative click (while remaining in $\mathcal{N}$). This procedure is deterministic given $M_{gt}^{(t_0)}$ and produces the same negative click locations for both models.

All click coordinates are saved per case and reused across SAM~2 and SAM~3. Consequently, any performance differences under click prompting reflect differences in prompt interpretation and propagation rather than differences in prompt placement.

\begin{table*}[!htbp]
\centering
\scriptsize
\setlength{\tabcolsep}{3.5pt}
\caption{{Robustness to click jitter under single-click (1,0) and multi-click (1,2) prompting. We use jitter radius $J=\pm 5$ pixels and $K=5$ jitter trials per case. For each dataset/structure, we report expected DSC under jitter as $\mu_{\text{jitter}}\pm\sigma_{\text{jitter}}$ (baseline), where $\mu_{\text{jitter}}$ is the mean DSC under jitter averaged over cases, $\sigma_{\text{jitter}}$ is the \emph{average within-case} standard deviation across jitter trials (jitter sensitivity), and baseline is the canonical-click DSC mean across cases. All values are in \%.}}
\label{tab:click_jitter}
\begin{tabular}{c|c|cc|cc}
\toprule
\multirow{2}{*}{\textbf{Dataset}} &
\multirow{2}{*}{\textbf{Structure}} &
\multicolumn{2}{c|}{\textbf{Click (1,0)}} &
\multicolumn{2}{c}{\textbf{Click (1,2)}} \\
\cline{3-6}
& & \textbf{SAM 2} & \textbf{SAM 3} & \textbf{SAM 2} & \textbf{SAM 3} \\
\midrule

\multirow{13}{*}{BTCV}
& Adrenal Gland (L) & 6.8$_{\pm 1.5}$ (6.9) & 16.3$_{\pm 4.1}$ (15.6) & 9.2$_{\pm 1.7}$ (9.7) & 26.3$_{\pm 7.1}$ (22.7) \\
& Adrenal Gland (R) & 2.6$_{\pm 2.0}$ (3.7) & 13.3$_{\pm 6.2}$ (14.5) & 5.3$_{\pm 2.2}$ (6.9) & 17.1$_{\pm 5.6}$ (17.3) \\
& Aorta & 88.0$_{\pm 0.1}$ (88.0) & 90.6$_{\pm 0.2}$ (90.6) & 87.1$_{\pm 0.6}$ (87.2) & 90.8$_{\pm 0.2}$ (90.8) \\
& Esophagus & 2.7$_{\pm 1.5}$ (5.1) & 50.4$_{\pm 5.3}$ (52.9) & 9.6$_{\pm 1.9}$ (9.9) & 54.4$_{\pm 8.3}$ (56.0) \\
& Gallbladder & 22.6$_{\pm 4.5}$ (19.4) & 30.9$_{\pm 2.7}$ (32.1) & 32.3$_{\pm 6.2}$ (31.9) & 33.3$_{\pm 4.1}$ (31.1) \\
& Inferior Vena Cava & 63.4$_{\pm 1.3}$ (64.6) & 75.6$_{\pm 1.2}$ (74.9) & 58.6$_{\pm 2.7}$ (58.6) & 75.5$_{\pm 0.3}$ (75.5) \\
& Kidney (L) & 55.0$_{\pm 4.4}$ (60.4) & 54.1$_{\pm 5.8}$ (54.7) & 70.1$_{\pm 4.8}$ (71.6) & 64.6$_{\pm 4.4}$ (67.5) \\
& Kidney (R) & 57.8$_{\pm 3.3}$ (56.4) & 67.6$_{\pm 4.0}$ (66.0) & 66.8$_{\pm 4.0}$ (68.2) & 68.6$_{\pm 5.2}$ (70.4) \\
& Liver & 45.0$_{\pm 4.3}$ (45.2) & 68.8$_{\pm 4.9}$ (71.5) & 52.6$_{\pm 3.8}$ (52.2) & 77.6$_{\pm 1.9}$ (80.0) \\
& Pancreas & 20.4$_{\pm 4.3}$ (22.9) & 34.8$_{\pm 3.8}$ (35.0) & 25.2$_{\pm 5.1}$ (27.4) & 38.4$_{\pm 4.1}$ (40.8) \\
& Portal \& Splenic Veins & 29.2$_{\pm 2.9}$ (27.7) & 31.4$_{\pm 3.1}$ (31.4) & 31.9$_{\pm 3.8}$ (31.0) & 33.0$_{\pm 3.8}$ (31.6) \\
& Spleen & 56.8$_{\pm 2.5}$ (57.5) & 61.1$_{\pm 1.5}$ (59.2) & 66.2$_{\pm 2.0}$ (65.8) & 61.4$_{\pm 2.5}$ (59.6) \\
& Stomach & 33.5$_{\pm 6.3}$ (27.0) & 48.9$_{\pm 5.0}$ (45.7) & 38.6$_{\pm 3.3}$ (39.2) & 49.6$_{\pm 4.5}$ (49.4) \\
\midrule

\multirow{13}{*}{FLARE22}
& Adrenal Gland (L) & 24.5$_{\pm 3.9}$ (26.4) & 27.5$_{\pm 4.1}$ (30.9) & 27.3$_{\pm 3.1}$ (27.4) & 42.3$_{\pm 6.4}$ (41.6) \\
& Adrenal Gland (R) & 8.9$_{\pm 4.3}$ (12.1) & 10.8$_{\pm 4.3}$ (9.3) & 12.7$_{\pm 4.8}$ (12.2) & 21.5$_{\pm 10.0}$ (21.4) \\
& Aorta & 93.3$_{\pm 0.1}$ (93.3) & 95.9$_{\pm 0.1}$ (95.9) & 93.5$_{\pm 0.9}$ (93.8) & 95.9$_{\pm 0.1}$ (95.9) \\
& Duodenum & 25.7$_{\pm 2.1}$ (25.7) & 32.4$_{\pm 3.8}$ (30.7) & 28.2$_{\pm 4.3}$ (26.9) & 33.9$_{\pm 2.5}$ (32.4) \\
& Esophagus & 4.2$_{\pm 1.1}$ (3.2) & 31.6$_{\pm 4.1}$ (27.5) & 6.4$_{\pm 1.5}$ (7.1) & 41.7$_{\pm 7.2}$ (43.6) \\
& Gallbladder & 35.9$_{\pm 2.1}$ (36.0) & 47.2$_{\pm 2.9}$ (47.6) & 38.4$_{\pm 5.5}$ (39.9) & 48.8$_{\pm 4.5}$ (50.0) \\
& Inferior Vena Cava & 73.4$_{\pm 0.3}$ (73.4) & 81.7$_{\pm 0.5}$ (81.9) & 70.1$_{\pm 2.6}$ (69.2) & 82.0$_{\pm 0.3}$ (80.7) \\
& Kidney (L) & 80.2$_{\pm 2.8}$ (82.0) & 78.4$_{\pm 1.0}$ (78.1) & 82.0$_{\pm 2.8}$ (78.9) & 78.8$_{\pm 1.9}$ (78.6) \\
& Kidney (R) & 85.5$_{\pm 0.9}$ (86.0) & 87.2$_{\pm 1.4}$ (87.7) & 86.5$_{\pm 1.3}$ (86.6) & 89.9$_{\pm 2.1}$ (90.5) \\
& Liver & 72.2$_{\pm 3.5}$ (73.6) & 86.6$_{\pm 1.6}$ (86.3) & 79.6$_{\pm 0.6}$ (80.0) & 87.5$_{\pm 1.6}$ (86.5) \\
& Pancreas & 28.4$_{\pm 2.7}$ (28.2) & 40.5$_{\pm 4.8}$ (38.8) & 32.2$_{\pm 4.8}$ (32.9) & 45.2$_{\pm 6.1}$ (44.4) \\
& Spleen & 71.3$_{\pm 2.1}$ (70.9) & 72.9$_{\pm 1.1}$ (72.4) & 78.5$_{\pm 3.5}$ (79.1) & 75.3$_{\pm 0.8}$ (75.4) \\
& Stomach & 44.2$_{\pm 4.0}$ (42.7) & 51.9$_{\pm 2.6}$ (53.1) & 48.8$_{\pm 6.1}$ (49.7) & 55.3$_{\pm 2.5}$ (53.6) \\
\midrule

\multirow{1}{*}{MSD Heart}
& Left Atrium & 17.6$_{\pm 0.8}$ (17.7) & 35.5$_{\pm 8.6}$ (30.4) & 25.2$_{\pm 3.4}$ (26.2) & 43.4$_{\pm 9.6}$ (43.7) \\
\midrule

\multirow{3}{*}{CAMUS}
& Left Atrium & 19.5$_{\pm 0.0}$ (19.5) & 28.1$_{\pm 0.9}$ (28.6) & 30.2$_{\pm 1.2}$ (30.3) & 65.9$_{\pm 2.8}$ (66.1) \\
& LV Endocardium & 27.4$_{\pm 0.0}$ (27.5) & 67.8$_{\pm 1.3}$ (67.5) & 62.5$_{\pm 2.0}$ (62.5) & 72.5$_{\pm 2.2}$ (73.0) \\
& LV Epicardium & 24.2$_{\pm 0.1}$ (24.2) & 28.2$_{\pm 1.7}$ (28.0) & 26.2$_{\pm 0.8}$ (25.8) & 26.8$_{\pm 2.2}$ (27.2) \\
\midrule

\multirow{12}{*}{CholecSeg8K}  
& Abdominal Wall & 55.5$_{\pm 0.1}$ (55.8) & 69.5$_{\pm 2.8}$ (67.4) & 57.3$_{\pm 2.4}$ (58.1) & 78.3$_{\pm 2.5}$ (79.4) \\
& Blood & 11.0$_{\pm 0.2}$ (5.1) & 13.5$_{\pm 0.4}$ (12.9) & 8.0$_{\pm 0.1}$ (7.9) & 40.9$_{\pm 2.9}$ (38.8) \\
& Connective Tissue & 70.4$_{\pm 0.1}$ (70.5) & 66.3$_{\pm 0.1}$ (66.3) & 65.8$_{\pm 0.1}$ (65.7) & 61.2$_{\pm 0.2}$ (61.2) \\
& Cystic Duct & 0.1$_{\pm 0.0}$ (0.1) & 0.2$_{\pm 0.1}$ (0.1) & 0.2$_{\pm 0.0}$ (0.2) & 0.1$_{\pm 0.0}$ (0.1) \\
& Fat & 60.7$_{\pm 0.8}$ (60.0) & 71.5$_{\pm 0.4}$ (71.4) & 60.7$_{\pm 2.2}$ (61.7) & 61.0$_{\pm 3.9}$ (60.9) \\
& Gallbladder & 71.7$_{\pm 2.2}$ (72.5) & 74.9$_{\pm 2.8}$ (79.2) & 75.3$_{\pm 1.3}$ (75.0) & 79.5$_{\pm 3.1}$ (75.4) \\
& GI Tract & 38.7$_{\pm 1.3}$ (37.8) & 66.1$_{\pm 4.3}$ (65.0) & 32.5$_{\pm 1.8}$ (31.5) & 70.9$_{\pm 2.0}$ (70.8) \\
& Grasper & 74.8$_{\pm 0.2}$ (74.6) & 80.5$_{\pm 1.9}$ (82.5) & 77.7$_{\pm 2.3}$ (75.6) & 79.1$_{\pm 2.0}$ (78.2) \\
& Hepatic Vein & 19.5$_{\pm 0.0}$ (19.5) & 21.6$_{\pm 0.1}$ (21.6) & 19.8$_{\pm 0.1}$ (20.1) & 21.8$_{\pm 0.0}$ (21.7) \\
& L-Hook Electrocautery & 66.8$_{\pm 0.1}$ (66.8) & 68.3$_{\pm 2.3}$ (64.5) & 65.6$_{\pm 0.3}$ (65.9) & 68.5$_{\pm 0.8}$ (68.6) \\
& Liver & 57.9$_{\pm 2.1}$ (57.4) & 63.7$_{\pm 5.6}$ (59.8) & 59.3$_{\pm 1.4}$ (60.9) & 66.3$_{\pm 3.1}$ (68.5) \\
& Liver Ligament & 98.7$_{\pm 0.0}$ (98.7) & 98.3$_{\pm 0.0}$ (98.3) & 98.6$_{\pm 0.0}$ (98.6) & 97.4$_{\pm 0.8}$ (96.2) \\
\bottomrule
\end{tabular}
\end{table*}

\subsection{Robustness to annotation noise via click jitter}
To assess the sensitivity of click prompting to realistic annotation variability while maintaining identical prompts across models, we perform a click-jitter experiment in which the initialization prompt set is recomputed under small random perturbations. For each case, we begin from the original non-jitter positive click (``canonical click") on the initialization frame $t_0$ and sample independent spatial offsets $\delta_x,\delta_y \sim \mathrm{Unif}(-J,J)$ to obtain a jittered positive click. For multi-click prompting (1,2), negative clicks are recomputed conditioned on the jittered positive click using the same ring-based protocol described in Appendix~\ref{subsec:appendix_click_protocol}, ensuring that the entire prompt set varies coherently with the user’s initial placement. We re-run inference under $K$ jitters per case. 

For each case, we summarize jitter robustness by the mean DSC across jitters ($\mu_{\text{jitter}}$), and the within-case standard deviation across jitters ($\sigma_{\text{jitter}}$). We then aggregate these values on a dataset/structure level by reporting both $\mu_{\text{jitter}}$ and $\sigma_{\text{jitter}}$ averaged over cases (jitter sensitivity). For reference, we also report the original canonical click DSC as a baseline. We report this sensitivity analysis on a subset of datasets spanning all four modalities: BTCV and FLARE22 (CT), MSD Heart (MR), CAMUS (ultrasound cine), and CholecSeg8K (endoscopy). All results are presented in Table~\ref{tab:click_jitter}.
