\section{Method}
\label{sec:method}

Our only modification to a standard 3D Gaussian mapper is to wrap it in a
colon-intrinsic coordinate system based on an online centerline and its Bishop
frame (Figure \ref{fig:overview}).  This provides a natural parameter $s$ for insertion depth, tubular
coordinates $(s,r,\theta)$ for each Gaussian, and cheap coverage statistics in
the same space.  All rendering, networks, and optimizers are unchanged from a
MonoGS-style backbone; implementation details and hyperparameters are given in
Appendix~\ref{app:impl}. Throughout, we treat camera poses and depth as upstream inputs and focus on the mapping/representation layer; correcting slow pose drift or improving depth prediction is beyond the scope of this work. Our key intuition is that this thin geometric module is sufficient
to unlock both better use of Gaussian capacity and coverage-aware outputs at
minimal additional cost.

\subsection{Gaussian map and reconstruction loss}
\label{sec:gaussian_map}

Let $\{T_t\}_{t=1}^N \subset SE(3)$ denote per-frame camera poses and
$\{I_t\}_{t=1}^N$ the corresponding RGB images.
We maintain a set of 3D Gaussian primitives $\{g_i\}_{i=1}^M$, where each
$g_i$ has mean $\mu_i \in \mathbb{R}^3$, anisotropic covariance $S_i$, color,
and opacity.
A differentiable Gaussian rasterizer projects $\{g_i\}$ into keyframe views
$T_k$ to produce rendered images $\hat I_k$ and, where available, depth maps
$\hat D_k$.
Using observed keyframe RGB $I_k$ and depth $D_k$, we adopt standard
photometric and depth reconstruction losses
\begin{equation}
  \mathcal{L}_{\text{photo}}
  =
  \sum_{k \in \mathcal{K}} \sum_{p \in \Omega_k}
  \bigl\| I_k(p) - \hat I_k(p) \bigr\|_1,
  \qquad
  \mathcal{L}_{\text{depth}}
  =
  \sum_{k \in \mathcal{K}} \sum_{p \in \Omega_k}
  \bigl\| D_k(p) - \hat D_k(p) \bigr\|_1,
\end{equation}
where $\mathcal{K}$ is the set of keyframes and $\Omega_k$ the valid pixels.
We do not introduce new networks or detectors; all supervision comes from
existing tracking and depth components.

\paragraph{Front-end / back-end separation.}
The system is split into a light \emph{front-end} and a heavier
\emph{back-end}:

\begin{itemize}[nosep]
  \item \textbf{Front-end.} Processes every frame, maintains an online
  centerline $C(s)$, its Bishop frame, an arc-length-based keyframe schedule,
  and low-cost coverage counters in colon coordinates.  This runs at the
  native video frame rate and can provide insertion depth and coarse coverage
  in real time.
  \item \textbf{Back-end.} Operates only on keyframes.  Given the current
  centerline, it assigns tubular coordinates to Gaussians, evaluates colon
  priors, and updates the Gaussian map using the combined loss.  This mirrors
  the mapping component of MonoGS-style 3D Gaussian reconstruction, but treats
  poses as fixed and injects colon-aware geometric constraints.
\end{itemize}

This separation keeps the extra computation marginal while allowing the
Gaussian optimizer to focus on fewer, well-chosen views.

\subsection{Online centerline and Bishop frame}
\label{sec:centerline_bishop}

Our goal is to convert noisy colonoscopy motion into a smooth, 1D backbone
that approximates the scope path and serves as a clinically meaningful
coordinate for insertion depth and segment location.

\paragraph{Backbone construction.}
We represent the exploration path by a centerline curve
$C(s) \in \mathbb{R}^3$ parameterized by arc length $s$.
Online, we maintain a sparse set of \emph{backbone points}
$\{b_k\}_{k=1}^K$ derived from camera centers $x_t$.
A new backbone point is added when (i) the translational motion since
$b_K$ exceeds $d_{\min}$, (ii) the motion direction does not exceed a
maximum bend angle, and (iii) the candidate is not within $d_{\text{loop}}$
of non-neighbor backbone points (loop avoidance).
This filters small jitter while following the exploration path; full
thresholds are listed in Appendix \ref{app:impl}.

\paragraph{B-spline smoothing and arc length.}
Given $\{b_k\}$, we fit a cubic B-spline $\tilde C(u)$, $u \in [0,1]$, and
sample it at $\{u_m\}_{m=0}^M$.
We compute cumulative distances
$s_0=0$, $s_m = s_{m-1} + \|\tilde C(u_m) - \tilde C(u_{m-1})\|_2$,
store positions $C_m = \tilde C(u_m)$ and arc-lengths $s_m$, and query
intermediate points by linear interpolation in $s$.
The coordinate $s$ is therefore a smoothed approximation of physical
insertion depth, more useful for documentation and coverage than frame
index or raw Euclidean distance. We found that this strategy reduces sensitivity to high-frequency pose jitter, as seen in Figure \ref{fig:noisy_traj} in Appendix \ref{app:qual_noise}, but it does not correct slow systematic drift.

\paragraph{Bishop frame.}
To define a stable tubular coordinate system we compute a Bishop frame
$\{T(s),N_1(s),N_2(s)\}$ along $C$.
For each sample $C_m$ we estimate the tangent $T_m$ by finite
differences and propagate an orthonormal pair of normals
$(N_{1,m},N_{2,m})$ using discrete parallel transport, i.e.\ the minimal
rotation taking $T_{m-1}$ to $T_m$ followed by re-orthogonalization.
This yields an orthonormal basis with minimal twist at each $s_m$, and is
numerically stable even in nearly straight segments where a Frenet frame
would be ill-conditioned.

We initialize $N_{1,0}$ by projecting a fixed reference direction (e.g.\ the first keyframe camera ``up'' vector) onto the plane orthogonal to $T_0$ and normalizing; we set $N_{2,0}=T_0 \times N_{1,0}$. To prevent occasional sign flips from discretization/noise, we enforce continuity by checking $\langle N_{1,m},N_{1,m-1}\rangle$: if negative, we flip both normals $(N_{1,m},N_{2,m}) \leftarrow (-N_{1,m},-N_{2,m})$. This preserves a consistent $\theta$ convention for unrolled coverage.


\subsection{Colon-intrinsic coordinates}
\label{sec:tubular_coords}

The Bishop frame turns the colon into a tubular coordinate system.
Given a 3D point $x$ (e.g.\ a Gaussian center $\mu_i$), we first find its
closest point on the centerline by minimizing $\|x - C(s)\|_2$ over sampled
$\{s_m\}$ and, optionally, refining with a 1D line search, obtaining
$s^\star$ and $C^\star = C(s^\star)$.
We then query the Bishop frame at $s^\star$,
$T^\star,N_1^\star,N_2^\star$, and express the offset
$\Delta x = x - C^\star$ as
\[
u = \Delta x \cdot N_1^\star,\quad
v = \Delta x \cdot N_2^\star,\quad
\ell = \Delta x \cdot T^\star.
\]
The radial distance and circumferential angle are
\[
r(x) = \sqrt{u^2 + v^2}, \qquad
\theta(x) = \operatorname{atan2}(v,u),
\]
so that $(s^\star, r(x), \theta(x))$ defines colon-intrinsic coordinates
for $x$.
We use these coordinates for keyframe selection, colon-aware priors, and
coverage accumulation.
Clinically, $s^\star$ aligns with insertion depth and segment, while
$\theta(x)$ captures circumferential location on the mucosal surface.
We define $\theta=0$ along $N_1(s)$ and increase $\theta$ toward $N_2(s)$ via the right-hand rule around $T(s)$. For binning and visualization, we wrap $\theta$ to $(-\pi,\pi]$ (or equivalently $[0,2\pi)$) with circular boundary handling.

\subsection{Arc-length-based keyframing}
\label{sec:arclen_keyframes}

Standard keyframe selection thresholds on Euclidean camera motion or image
overlap, which oversamples straight segments and undersamples tight bends in
a tubular organ.
We instead select keyframes by distance traveled along the centerline.

At frame $t$ with camera center $x_t$, we estimate the arc-length increment
$\Delta s_t$ since frame $t-1$:
if the centerline already covers both positions, we project $x_{t-1}$ and
$x_t$ onto $C$ to obtain $s_{t-1}$ and $s_t$ and set
$\Delta s_t = |s_t - s_{t-1}|$; otherwise, while the centerline is still
growing, we approximate forward progress using the tangent at the endpoint
and the signed projection of $(x_t - x_{t-1})$ onto that tangent, clipped at
zero.
We maintain an accumulator $A_t = A_{t-1} + \Delta s_t$ with $A_0 = 0$ and
create a new keyframe when standard appearance- and time-based criteria are
met and $A_t \ge \tau_{\text{KF}}$, where $\tau_{\text{KF}}$ is the desired
spacing in millimeters.
The accumulator is then reset.
This yields approximately uniform keyframe density in $s$ and automatically
densifies keyframes in high-curvature segments where coverage is more
challenging.

\subsection{Colon-aware priors and total loss}
\label{sec:centerline_losses}

Once tubular coordinates are defined, we use them to regularize the
reconstruction toward an anatomically plausible hollow tube with a smooth
backbone.
This ties the Gaussian map to colon geometry and reduces the search space
for the optimizer.

\paragraph{Radial tube prior.}
The colon wall is approximately tubular around the centerline at a
characteristic radius $r_{\text{wall}}$.
For each Gaussian $i$ with center $x_i$ we compute its radius
$r_i = r(x_i)$ and penalize deviations from $R_{\text{wall}}$ via
\begin{equation}
  \mathcal{L}_{\text{radial}}
  =
  \sum_i
  % w_i \,
  \rho \!\bigl((r_i - r_{\text{wall}})^2\bigr),
  \label{eq:radial_loss}
\end{equation}
where $\rho(\cdot)$ is a robust penalty.
% and $w_i$ is an optional weight (e.g.\ down-weighting low-coverage or highly specular regions).
% This encourages Gaussians to concentrate near the mucosal surface instead of filling the lumen interior. 
This term is a soft, robust regularizer: it primarily discourages lumen ``fill-in'' and off-wall outliers while the data terms determine local wall shape.

\paragraph{Centerline smoothness prior.}
To avoid spurious kinks from noisy motion, we regularize the discrete
second derivative of the sampled centerline positions $\{C_m\}$:
\begin{equation}
  \mathcal{L}_{\text{curv}}
  =
  \sum_{m=1}^{M-1}
  \bigl\| C_{m+1} - 2 C_m + C_{m-1} \bigr\|_2^2.
  \label{eq:centerline_curv}
\end{equation}
This penalizes rapid curvature changes while allowing anatomically
plausible bending and yields a reliable backbone for reporting and
unrolling.

\paragraph{Total loss.}
The back-end minimizes a combination of photometric, geometric, and
colon-aware terms:
\begin{equation}
  \mathcal{L}
  =
  \mathcal{L}_{\text{photo}}
  +
  \mathcal{L}_{\text{depth}}
  +
  \lambda_{\text{radial}} \,\mathcal{L}_{\text{radial}}
  +
  \lambda_{\text{curv}} \,\mathcal{L}_{\text{curv}},
\end{equation}
with scalar weights $\lambda_{\text{radial}}$ and $\lambda_{\text{curv}}$.
The same tubular coordinates used to define these priors are also used to
compute unrolled coverage maps and segment-wise coverage statistics in
Section~\ref{sec:experiments}, so the representation that improves geometry
and efficiency also directly exposes clinically meaningful readouts.
