\section{Experiments}
\label{sec:experiments}

We evaluate whether a thin layer of colon-aware geometry on top of a standard
3D Gaussian mapper can (i) match the geometric accuracy of an endoscopy-specific
3DGS baseline, (ii) retain the frame rates of a MonoGS-style mapper, and
(iii) expose useful online coverage information at negligible additional cost.

\subsection{Datasets and protocol}

We use the four screening colonoscopy videos from the C3VD phantom dataset
recorded by practicing gastroenterologists~\cite{bobrow2023c3vd}.
Each sequence contains 4{,}700--5{,}500 frames and is accompanied by
ground-truth camera poses, RGB video, and a watertight CAD mesh of the colon
mold. We use per-frame monocular depth predictions
from~\cite{hardy2025coloncrafter} as supervision for all methods and resize
images to $384\times384$.

To isolate the mapping/representation contribution, all methods are evaluated
in a mapping-only setting with fixed ground-truth poses and identical depth
supervision; this avoids conflating front-end tracking failures with
reconstruction quality on long, fast-motion sequences. All methods are then
evaluated under the same ``every-other-frame'' protocol, yielding an effective
input rate of 15\,fps. Every 8th processed frame is held out as a validation
frame, with the remainder used for optimization; this ratio balances evaluation
frequency against the number of frames available for map building. We run
per-scene optimization over the entire sequence and evaluate online: at each
processed frame, the current map is used to render all held-out frames seen so
far. Additional low-level details are given in Appendix~\ref{app:impl}.

\subsection{Baselines}

We compare against two 3D Gaussian mapping back-ends:

\noindent\textbf{EndoGSLAM.}
EndoGSLAM is an endoscopy-specific 3D Gaussian SLAM system originally
evaluated on short, robot-acquired C3VD sequences~\cite{wang2024endogslam}.
For our long phantom sequences we disable tracking and pose refinement and
feed ground-truth poses, so only the mapping component is active.

\noindent\textbf{MonoGS-style mapper.}
The MonoGS baseline follows a standard 3D Gaussian mapping pipeline without
colon-specific structure~\cite{Matsuki2024GaussianSplattingSLAM}. It uses
similar depth supervision and rendering losses as our method, but does not
estimate a centerline, does not use tubular coordinates, and employs a
conventional frame-based keyframe policy. For a controlled comparison,
\textbf{Ours} and the \textbf{MonoGS-style mapper} share the same Gaussian
mapping backbone (renderer, optimization loop, densification/pruning logic,
opacity reset policy, and iteration schedule); the only differences are the
centerline/Bishop-frame module, arc-length keyframing, and the added
colon-aware loss terms. \textbf{EndoGSLAM} is run with its standard mapping
implementation with tracking and pose refinement disabled, ensuring the
comparison reflects mapping behavior rather than front-end drift. Additional
hyperparameters are listed in Appendix~\ref{app:impl}.

All methods use exactly the same inputs (RGB, predicted depth, ground-truth
poses) and run on the same NVIDIA RTX6000 GPU.

\subsection{Metrics}

\paragraph{Reconstruction quality.}
For each held-out frame $k$, we render $\hat I_k$ from its ground-truth pose
and compute PSNR and SSIM with respect to the observed RGB image $I_k$. For
geometry we compute a one-directional Chamfer distance (CD) from the
reconstructed surface to the phantom mesh: we sample point clouds from both,
compute nearest-neighbour distances from reconstruction points to mesh points,
and average over points restricted to a fixed radial band around the centerline
to ignore distant background.

\paragraph{Runtime and memory.}
Effective FPS is defined as $\text{FPS} = N/t$, where $N$ is the number of
processed frames (including non-keyframes) and $t$ is the total wall-clock
time for the sequence, including all components of each method. We also report
the number of active Gaussians at the end of optimization.

\paragraph{Coverage.}
Using tubular coordinates $(s,r,\theta)$, we maintain online coverage
statistics for each centerline segment and circumferential bin:
(i) a scalar coverage score per segment (fraction of time the segment is
within a viewing cone from the active camera) and
(ii) a histogram of Gaussian counts over $\theta$ (``quadrants'').
These metrics are updated in real time as $s$ grows and are later compared to
a visibility oracle derived from the phantom mesh
(Appendix~\ref{app:perseq}). We refer to these outputs as \emph{geometric
coverage}: they are visibility-based proxies derived from pose and geometry,
and do not directly measure mucosal visualization quality under specularities,
debris, blur, or occlusions behind folds.

\subsection{Geometry--speed trade-off}

Table~\ref{tab:main_results} summarizes reconstruction quality, runtime, and
model size, averaged over the four C3VD sequences; per-sequence results are
given in Appendix~\ref{app:perseq}.

\begin{table}[t]
\centering
\caption{\textbf{Quantitative comparison on C3VD phantom sequences}
(ground-truth poses). Values are mean $\pm$ standard deviation over four
sequences. Higher is better for PSNR, SSIM, FPS; lower is better for CD.
Per-sequence scores are in Appendix~\ref{app:perseq}.}
\setlength{\tabcolsep}{4pt}
\begin{tabular}{lrrrrr}
\toprule
\textbf{Method} & \textbf{PSNR} $\uparrow$ & \textbf{SSIM} $\uparrow$ & \textbf{FPS} $\uparrow$ & \textbf{CD} $\downarrow$ & \textbf{Points (M)} \\
\hline
EndoGSLAM & $11.32 \pm 0.24$ & $0.346 \pm 0.025$ & $1.08 \pm 0.18$ & $6.61 \pm 1.35$ & $3.11 \pm 0.36$ \\
MonoGS    & $11.26 \pm 0.66$ & $0.320 \pm 0.053$ & $8.20 \pm 0.67$ & $7.91 \pm 0.56$ & $0.59 \pm 0.07$ \\
\rowcolor{rowgray}
Ours      & $11.56 \pm 0.92$ & $0.335 \pm 0.057$ & $6.73 \pm 0.43$ & $5.73 \pm 0.58$ & $1.14 \pm 0.21$ \\
\bottomrule
\end{tabular}
\label{tab:main_results}
\end{table}
Our centerline-aware mapper matches or slightly improves PSNR relative to both
baselines while achieving a Chamfer distance lower than both: we reach
EndoGSLAM-level CD despite using fewer Gaussians, and improve CD by
$2.2$\,mm on average over the MonoGS baseline. SSIM follows a similar pattern,
with our method falling between EndoGSLAM and MonoGS; the difference across
all three methods is small relative to the standard deviation, suggesting that
structural similarity is not the primary axis of variation on this dataset.
At the same time, our effective FPS is close to that of MonoGS and
approximately $6\times$ higher than EndoGSLAM, even though we maintain an
online centerline, Bishop frame, and coverage counters. This trend holds
consistently across all four sequences (Appendix~\ref{app:perseq}), supporting
our claim that a thin geometric layer can recover much of the geometry that
EndoGSLAM obtains from a heavier mapping stack while retaining MonoGS-like
speed.

\subsection{Qualitative geometry comparison}

Figure~\ref{fig:qualitative} compares the reconstructed colon geometry on a
representative C3VD sequence. We show the ground-truth phantom mesh alongside
point clouds from our centerline-aware mapper, EndoGSLAM, and the MonoGS-style
baseline, annotated with Chamfer distance (CD), number of active Gaussians, and
effective FPS.

MonoGS achieves the highest FPS but allocates many Gaussians throughout the
lumen, producing a thick, irregular tube and higher CD. EndoGSLAM concentrates
points more tightly on the wall but at the cost of $\sim3\times$ more Gaussians
and substantially lower FPS. Our method forms a thin, continuous tubular shell
that more closely matches the phantom geometry while using fewer Gaussians than
EndoGSLAM and running at near-MonoGS frame rates, agreeing with the
quantitative trade-off in Table~\ref{tab:main_results}.

\begin{figure}[t]
  \centering
  \includegraphics[width=\linewidth]{figures/pc_compare.pdf}
  \caption{
  \textbf{Geometry--speed trade-off on a C3VD phantom sequence.}
  From left to right: ground-truth colon mesh and reconstructions from our
  centerline-aware mapper, EndoGSLAM, and a MonoGS-style mapper. Our method
  attains lower Chamfer distance than both baselines, uses roughly $3\times$
  fewer Gaussians than EndoGSLAM, and maintains near-MonoGS frame rates,
  yielding a thin tubular wall with fewer interior splats.
  }
  \label{fig:qualitative}
\end{figure}

\subsection{Coverage in colon coordinates}

A key advantage of expressing the map in tubular coordinates is that coverage
can be computed online in the same representation.
Figure~\ref{fig:coverage} visualizes our coverage output for one phantom
sequence.

Panel~(a) shows the colon unrolled into $(s,\theta)$ with coverage encoded as
a heatmap. Gaps or cold regions correspond to stretches that were rarely viewed
with favourable distance and angle; in the phantom sequences these often occur
around sharp bends and short withdrawal bursts. Panel~(b) aggregates this into
segment-wise summaries that can be inspected during or after a procedure,
highlighting under-inspected regions. Panel~(c) reports how Gaussians (and
therefore map capacity) are distributed over circumferential angle. If a
segment's Gaussians are heavily concentrated in one quadrant, it means the
camera spent most of its time looking along that wall; the opposite wall may
have received little attention even if overall time in that segment was
adequate.

These coverage statistics are updated continuously as the centerline grows,
with negligible additional cost: they reuse the same projections needed to
render keyframes and require only per-segment, per-quadrant counters. In
Appendix~\ref{app:perseq} we compare our online coverage scores to an oracle
based on the phantom mesh and ground-truth poses and find good agreement,
supporting their use as geometry-aware quality indicators rather than purely
heuristic visualizations.

\begin{figure}[t]
  \centering
  \includegraphics[width=\linewidth]{figures/coverage.pdf}
  \caption{\textbf{Online coverage from tubular coordinates on a phantom
  sequence.} (a)~Unrolled coverage map in $(s,\theta)$: horizontal axis is arc
  length along the centerline, vertical axis is circumferential angle $\theta$,
  and color denotes our per-bin coverage score. Vertical lines mark anatomical
  segments of the phantom. (b)~Segment-wise coverage summaries: top, bar plot
  of the fraction of surface area in each segment exceeding a minimum coverage
  threshold; bottom, coverage fraction as a function of $s$.
  (c)~Circumferential balance: histogram of Gaussian counts over $\theta$,
  aggregated along the entire sequence (left) and for two example segments
  (right). Strong asymmetries indicate that the camera spent most of the time
  looking at one wall, suggesting that the opposite wall may warrant closer
  inspection.}
  \label{fig:coverage}
\end{figure}