\documentclass{midl} % Include author names
% \documentclass[anon]{midl} % Anonymized submission

% The following packages will be automatically loaded:
% jmlr, amsmath, amssymb, natbib, graphicx, url, algorithm2e
% ifoddpage, relsize and probably more
% make sure they are installed with your latex distribution

% \usepackage{mwe} % to get dummy images
% \usepackage{array}
\usepackage{multirow}
\usepackage{graphicx}
\usepackage{algorithm,algorithmic}
\usepackage{booktabs}
\usepackage{threeparttable}
\renewcommand{\algorithmicrequire}{\textbf{Input:}}
\renewcommand{\algorithmicensure}{\textbf{Output:}}

\jmlrvolume{-- Under Review}
\jmlryear{2024}
\jmlrworkshop{Full Paper -- MIDL 2024 submission}
\editors{Under Review for MIDL 2024}

\jmlryear{2024}\jmlrworkshop{Full Paper -- MIDL 2024}\jmlrvolume{-- 85}\editors{Accepted for publication at MIDL 2024}
\title[OFELIA]{OFELIA: Optical Flow-based Electrode LocalIzAtion}

\midlauthor{\Name{Xinyi Wang\nametag{$^{1,2,3}$}} \Email{xinyiwang@mail.ustc.edu.cn}\\
\addr $^{1}$ School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, 230026, P.R.China \\
\addr $^{2}$ Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu, 215123, P.R.China\\
\addr $^{3}$ Shanghai MicroPort EP MedTech Co., Ltd. Shanghai, 201318, P.R.China\AND
\Name{Zikang Xu\nametag{$^{1,2}$}} \Email{zikangxu@mail.ustc.edu.cn}\AND
\Name{Qingsong Yao\nametag{$^{4}$}} \Email{yaoqingsong19@mails.ucas.edu.cn}\\
\addr $^{4}$ Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China \AND
\Name{Yiyong Sun\nametag{$^{3}$}} \Email{yiyong.sun@everpace.com}\AND
\Name{S.Kevin Zhou\nametag{$^{1,2,4}$}\midljointauthortext{Corresponding Author}} \Email{skevinzhou@ustc.edu.cn}
}


\begin{document}

\maketitle
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
%%
%% abstract
%% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\begin{abstract}

Catheter ablation is one of the most common cardiac ablation procedures for atrial fibrillation, which is mainly based on catheters with electrodes collecting electrophysiology signals.
Catheter electrode localization facilitates intraoperative catheter positioning, surgical planning, and other applications such as 3D model reconstruction.
In this paper, we propose a novel deep network for automatic electrode localization in an X-ray sequence, which integrates spatiotemporal features between adjacent frames, aided by optical flow maps.
To improve the utility and robustness of the proposed method, we first design a saturation-based optical flow dataset construction pipeline, then finetune the optical flow estimation to obtain more realistic and contrasting optical flow maps for electrode localization.
The extensive results on clinical-challenging test sequences reveal the effectiveness of our method, with a mean radial error (MRE) of 0.95 mm for radiofrequency catheters and an MRE of 0.71 mm for coronary sinus catheters, outperforming several state-of-the-art landmark detection methods.

\end{abstract}

\begin{keywords}
Catheter Electrode Detection, Optical Flow
\end{keywords}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
%%
%% introduction
%% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\section{Introduction}

Atrial fibrillation (AFib), atrial flutter, and premature ventricular contractions are prevalent manifestations of cardiac arrhythmias. Frequent cardiac arrhythmias may give rise to serious consequences, for instance, AFib can lead to blood clots in the heart~\citep{AFDamage}. 
Compared with pharmaceutical interventions, catheter-based radiofrequency ablation techniques in cardiac electrophysiology (EP) stand as the standard surgical intervention for the definitive treatment of rapid cardiac arrhythmias, characterized by immediate therapeutic effects and high success rates~\citep{jama.2019.0692,nature2021}.
The electrode is one of the most crucial components of the catheters, which is used for EP signal collection and catheter localization.
Catheter electrode localization can facilitate intraoperative catheter positioning, surgical planning, 3D model reconstruction and so on.
However, due to the unstable imaging quality and the intersections among multiple catheters throughout the clinical surgical procedure, it is hard for physicians to locate the electrodes precisely in real-time X-ray images.
Thus, it is necessary to develop accurate catheter placement detection methods not only to alleviate burdens for clinicians in the surgery but help novice doctors get familiar with this surgery.

%% Deep Learning Background (two main problems)

There are research works~\citep{ambrosini2017fully, CathSeg, FWNet} that formulate this task as a segmentation task, which locates the catheters by the center of the mask of electrodes. Other studies adopt single frame landmark detection (LD) methods to solve the problem. Catheter segmentation information is indeed helpful, but the labeling is time-consuming. Experimentally, we observe that the optical flow map can to some extent provide shape and boundary information of catheter electrode without the need for specific annotations~\citep{ConTrack}. Besides, optical flow maps can provide temporal context in X-ray videos, which could make full use of \textit{correlation between successive frames} and the \textit{label-free shape context}.

To solve the electrode localization problem and motivated by the the above empirical findings, we go beyond a single frame 
% model this problem as a \textbf{Video Landmark Detection} task 
and propose an effective and easy-to-implement network, {\bf O}ptical {\bf F}low-based {\bf E}lectrode {\bf L}ocal{\bf I}z{\bf A}tion ({\bf OFELIA}) for electrode localization in an X-ray video sequence. 

Specifically, we introduce the optical flow map between consecutive frames as the input to the LD network, which not only presents the position changes over time but also provides estimated shape information of the electrodes (As shown in Fig.~\ref{figs:raft_outputcompare}).
Besides, as the ground truth of optical flow is difficult to acquire in our task, we construct a simulated dataset, the Flying-Catheter Dataset, based on several pre-trained RAFT~\citep{teed2020raft} models, to train the optical flow estimator.

This paper offers the following contributions:

\begin{enumerate}
\vspace{-5pt}
    \item We propose an OFELIA network, which integrates the spatiotemporal information in an X-ray sequence for precise electrode localization. To the best of our knowledge, it is the first to introduce optical flow into electrode localization in an X-ray sequence;
    \item To bridge the gap between natural images and X-ray images, we construct a Flying-Catheter dataset and fine-tune RAFT for accurate optical flow estimation.
    \item Extensive experiments on test datasets illustrate that the OFELIA method outperforms the state-of-the-art electrod detection methods on two commonly used catheters.
\end{enumerate}

\section{Related Works}

\noindent\textbf{Optical Flows in MedIA.}
Optical flow maps are occasionally used in medical image areas. The FW-Net~\citep{FWNet} introduces an end-to-end framework, which combines a segmentation network, an optical flow network, and a flow-guided warping function to learn temporal continuity for real-time catheter segmentation in a 2D X-ray fluoroscopy sequence. Optical flow maps are also utilized in~\citep{Flow-Seg} to achieve echocardiography segmentation. FlowReg~\citep{FlowReg} introduces a two-part deep learning system for unsupervised neuroimaging registration, combining 3D affine adjustments and 2D deformable fine-tuning based on the optical flow network to enhance global and local alignment of medical imaging volumes.

\noindent\textbf{Single-Image Landmark Detection.} In ~\citep{yao2020miss}, a multi-task U-Net is implemented to predict both heatmap and offset maps of landmarks simultaneously. In ~\citep{9879107}, an efficient contour-hugging landmark detection method with uncertainty estimation is depicted. In ~\citep{HQLandmark}, a light-weighted universal anatomical landmark detection model has been developed.

\noindent\textbf{Video Landmark Detection.}
Compared to single-image landmark detection, video landmark detection utilizes the information between frames.
In~\citep{Ullah}, a tracker is implemented to extract the tip detection results in the last frame as a reference for segmenting the tip in successive frames. U-LanD~\citep{U-LanD} capitalizes on the uncertainty inherent in landmark prediction to achieve automatic detection of landmarks in key frames of videos. 
The most similar work to ours is ConTrack~\citep{ConTrack}, which uses both spatial and temporal context for tip detection and tracking. It incorporates multiple template frames and a search frame for catheter segmentation and initial tip detection. Subsequently, it uses successive segmentation to refine tips with optical flow maps. However, it relies on catheter segmentation masks, necessitating extensive annotations.
In contrast to this, our OFELIA only requires point annotations, which is much easier to obtain.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
%%
%% methods
%% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\section{Method}

\noindent \textbf{Problem Definition}
Let $D = \{(X_t, Y_t)\}_{t=1}^{N}$ represents an X-ray video sequence with $N$ frames, where $X_t\in\mathbb{R}^{w\times h}$ is the $t$-th video frame with a shape of (w, h), and $Y_t\in\mathbb{R}^{N_e\times w \times h}$ denotes the position of $N_e$ landmarks in frame $t$.
Specifically, suppose that the $k$-th landmark of the $t$-th frame is at $(x,y)$. $Y_t^k$ is defined:
\begin{equation}
    \begin{aligned}
        Y_t^k(i,j) = 
            \begin{cases}
                & 1\text{,     if } i=x ~\&~ j = y; \\
                & 0\text{  ,       otherwise.}
            \end{cases}
    \end{aligned}
    \label{Eq_GT}
\end{equation}
OFELIA aims to train a network $f(\cdot)$, which takes the $\{X_{t}\}_{t=1}^N$ as input and predicts the locations of electrodes in each frame, i.e., $\{\hat{Y_{t}}\}_{t=1}^{N}$.


\noindent \textbf{OFELIA Network}
The architecture of OFELIA is shown in Fig.~\ref{fig:DEDNet}, which aims to predict the landmark positions $Y_t$ using both $X_t$ and $X_{t+1}$. Particularly, we try to solve this problem by utilizing the information between the two frames, i.e., the optical flow map.

Optical flow, a concept to measure the motion of objects in continuous images, is widely used for tracking cells in fluoroscopy~\citep{guo2013red}. By computing the direction and magnitude of the velocity, the optical flow map can be used to describe the temporal-spatial information of the electrodes in an X-ray video.

Specifically, we first predict the optical flow map $F_{t\rightarrow t+1}$ between $X_t$ and $X_{t+1}$ using an optical flow estimator $\varphi^*(\cdot)$, which takes two frames as input and predicts the pixel movement between them, i.e., $F_{t\rightarrow t+1} = \varphi^*(X_t, X_{t+1})$.
Then, the raw image $X_t$ and optical flow map $F_{t\rightarrow t+1}$ are concatenated and sent to a modified U-Net~\citep{unet2015}.
The encoder of the U-Net is a pre-trained ResNet-34, while the decoder consists of 5 up-sampling layers with 512, 256, 128, 64, and 32 channels respectively.
As there are $N_e$ electrodes that need to be localized in each frame, we add a convolution layer after the last layer of the decoder to squash the number of channels to $N_e$. The $n$-th channel represents the predicted localization probability map of the $n$-th electrode and the position with the highest probability is regarded as the final prediction. The loss function $\mathcal{L}$ is defined as the average channel-wise cross-entropy loss between the predicted probability map $\hat{Y}_{t}^{N_e}$ and ground truth $Y_{t}^{N_e}$, given as
%\begin{equation}
    $\mathcal{L} = \frac{1}{N_e}\sum_{k=1}^{N_e} \mathcal{L}_{CE}(\hat{Y}_{t}^{k}, Y_{t}^{k})$.
%  \label{Eq_loss}  
%\end{equation}

\begin{figure}[t]
    \centering    \includegraphics[width=0.9\textwidth]{figs/DEDNet.pdf}
    \caption{Overview of OFELIA. During the training procedure, the optical flow estimator is frozen and only the parameters of the Encoder $E$ and the Decoder $D$ are updated.}
    \label{fig:DEDNet}
    \vspace{-10pt}
\end{figure}

The information captured by the optical flow map illustrates the catheter movement along with time and provides additional spatial shape information of the electrodes, which usually requires manual annotations.
Combining X-ray images and optical flow maps can drive the neural network to pay more attention to the electrode part of the field of view. This contributes to the precise localization of the electrodes.

\noindent \textbf{Optical Flow Estimator}
Estimating the optical flow map is essential for the final prediction of our OFELIA. However, the ground truth optical flow is inaccessible in our task. 
Besides, most applicable optical flow estimators are trained on natural image datasets, and the large domain gap results in poor prediction.
Thus, we {\bf simultaneously} construct a simulated X-ray optical flow dataset (called  Flying-Catheter) and train a task-specific flow estimator on it. 
The pipeline is shown in Algorithm.~\ref{pseduocode}.

\begin{algorithm}
    \caption{\textbf{Optical Flow Estimator}}
    \label{pseduocode}
    \begin{algorithmic}
        \REQUIRE{Original Dataset: $D_{ori} = \{X_t\}_{t=1}^N$, Pre-trained RAFTs: ${\varphi^{c}, \varphi^{k}, \varphi^{s}, \varphi^{t}}$, Original RAFT: $\varphi$, Quality Control Threshold: $\alpha$, Flying-Catheter Dataset: $D_{f-c} = \varnothing$}
        \STATE $t \leftarrow 1$
        \REPEAT
            \STATE Predict Optical Flow between $X_t$ and $X_{t+1}$: $\hat{F}_{t\rightarrow t+1}^p \leftarrow \varphi^p(X_t, X_{t+1}), p\in \{c, k, s, t\}$;
            \STATE Convert $\hat{F}_{t\rightarrow t+1}^p$ into HSV color space;
            \STATE Compute Saturation Factor $\text{SF}^p$ for each flow map using Eq.~(\ref{equ:sf});
            \STATE Find the optical flow map with the highest SF using Eq.~(\ref{equ:highest});
            \IF{$SF^{p^*} >= \alpha$}
                \STATE Add sample pair to the Flying-Catheter Dataset: $D_{f-c} = D_{f-c} \cup \{X_t, X_{t+1}, \hat{F}_{t\rightarrow t+1}^{p^*}\}$;
            \ENDIF
        \UNTIL $t = N-1$
        \STATE Update $\varphi^*$ on $D_{f-c}$ using gradient descent;
        \ENSURE{Fine-tuned RAFT model: $\varphi^{*}$}
    \end{algorithmic}
\end{algorithm}

First, we adopt four publicly available pre-trained optical flow estimators on our catheter dataset, including \textit{raft-chairs}, \textit{raft-kitti}, \textit{raft-sintel} and \textit{raft-things} which are variations of RAFT~\citep{teed2020raft} trained different natural RGB datasets. For each frame, $X_t$, the four estimators predict four flows, $\hat{F}_{t\rightarrow t+1}^{p}, p\in\{c,k,s,t\}$. Then, we propose a saturation-channel-based selection algorithm to decide the final pseudo optical flow $\hat{F}_{t\rightarrow t+1}^{*}$ for each frame.
Specifically, the predicted optical flow map, which is coded in the RGB format following RAFT, is first converted to the HSV format. We empirically find that the saturation channel in HSV is good at partically capturing the eletrode boudaries. Then the mean saturation factor (SF) of each map is calculated using the below:
\begin{equation}
    \text{SF}^p = \frac{1}{N_e}\sum_{i=1}^{N_e}\hat{F}_{t\rightarrow t+1}^{p}(x_i,y_i),
    \label{equ:sf}
\end{equation}
where $(x_i, y_i)$ is the coordinate of the $i$-th electrode landmark in frame $X_t$.
The pseudo optical flow map is defined as the predicted optical flow map with the highest SF:
\begin{equation}
    \hat{F}_{t\rightarrow t+1}^{*} = \hat{F}_{t\rightarrow t+1}^{p^*}, ~~p^* = \arg\max_{p} SF^p.
    \label{equ:highest}
\end{equation}

However, due to the large gap between natural images and X-ray images, even the best of the four predictions may have low quality. Thus, we conduct a quality control procedure on the constructed dataset by discarding samples with an SF smaller than a threshold of $\alpha$.
Finally, we finetune the original RAFT on the remaining dataset, denoted as the Flying-Catheter Dataset, for task-specific optical flow estimation.
Compared to the original RAFT and RAFT trained on other natural image datasets, RAFT trained on the Flying-Catheter dataset can better capture spatial information of the electrodes, which serves as a strong prior knowledge for landmark detection using OFELIA (as shown in Fig.~\ref{figs:raft_outputcompare}).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
%%
%% Experiments
%% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Experiments and Results}
%\subsection
\noindent \textbf{Experiment Settings}\\
%\vspace{-5pt}
\noindent\underline{Dataset.}
This study uses an in-house multi-center dataset of fluoroscopic sequences captured during cardiac ablation procedures and animal experiments.
Most of the frames include two types of commonly used catheters, Coronary Sinus (CS) and Radio-Frequency (RF) catheters.
All the landmarks are defined as the center point of the electrodes except for the first landmark of the RF catheter, which is defined as the tip of the RF catheter. This results in 14 landmarks (4 for RF and 10 for CS) in each frame.
The dataset is annotated by two skilled engineers using LabelMe~\citep{russell2008labelme} and reviewed by three professional clinical experts.
The training and test sets consist of 560 sequences(14,768 frames) and 346 sequences(7,711 frames), respectively.
To evaluate the stabilization and generalization of our proposed method, we extract two clinical-challenging (CCA) subsets, which consist of frames of a special scene in the operation (53 sequences, 575 frames, denoted as Test-DSA Subset) and frames where the catheters are partially obstructed (145 sequences, 2,266 frames, denoted as Test-OBS Subset). These two test sets are more difficult for catheter electrode detection as they involve more complex situations.

\noindent\underline{Metrics}
We use mean radial error (MRE) to measure the Euclidean distance between prediction and ground truth. Additionally, the successful detection rate (SDR) is assessed across three different radii: 1mm, 2mm, and 4mm.

\noindent\underline{Implementation details.} 
Our model is implemented in PyTorch and trained on an NVIDIA A100 GPU. 
The image pairs are augmented by random rotation, intensity scaling, and elastically deformation, and resized to $640 \times 640$ before being sent to the network.
The network training is conducted utilizing the Adam optimizer, commencing with a learning rate of 0.001 and employing a batch size of 4 for 20 epochs. Learning rate adjustments are implemented by decreasing it by a factor of 0.1 at epochs 4, 8, 12, and 16.
The threshold for quality control of the Flying-Catheter Dataset is set to $\alpha=0.5$.

\begin{figure}[t]
    \centering \includegraphics[width=0.8\textwidth]{figs/res_test.png}
    \caption{Qualitative results on the Test set(a), Test-DSA subset(b) and Test-OBS subset(c). The ground truth and predicted electrodes of CS catheter are in \textcolor{blue}{Blue} and \textcolor{yellow}{Yellow}, respectively, and those of RF catheter are in \textcolor{green}{Green} and \textcolor{red}{Red}, respectively.}
    \label{fig:qualitative_results}
    \vspace{-20pt}
\end{figure}

%\subsection
\noindent \textbf{Main Results}\\
We compare OFELIA with several commonly used algorithms for medical landmark detection~\citep{unet2015,yao2020miss,9879107,HQLandmark}, and the quantitative results are shown in Table~\ref{tab:result}.
% ~\footnote{https://github.com/qsyao/attack_landmark_detection}
% ~\footnote{https://github.com/jfm15/ContourHuggingHeatmaps}
% ~\footnote{https://github.com/MIRACLE-Center/YOLO_Universal_Anatomical_Landmark_Detectio}
We observe that OFELIA outperforms the baseline methods on most of the metrics on the test sets.
This might result from the temporal information introduced from the flow map, as other methods focus on spatial features only. %learned from a single frame.
Besides, our OFELIA presents good generalization on the CCA sequences as the SDR drop is much lower compared to other methods, which is justifiable as bringing in extra knowledge improves the robustness of the network and provides an aid to deal with difficult situations.
We also present qualitative results of different detection methods in Fig.~\ref{fig:qualitative_results}, where the MRE of each image is on the top of the image. Our method outperforms other methods significantly. More results are in the Appendix.

\begin{table}[]
\centering
\caption{Results on the Test sets and CCA subsets. \textbf{Best} and \underline{Second Best} are highlighted.}
\label{tab:result}
\resizebox{0.7\textwidth}{!}{
\begin{threeparttable}
\begin{tabular}{lcccccccc}
\toprule
\multicolumn{9}{c}{\textbf{Test Dataset}}                                                                                         \\ \midrule
\multirow{3}{*}{\textbf{Model}} & \multicolumn{4}{c}{\textbf{RF Catheter}}                & \multicolumn{4}{c}{\textbf{CS Catheter}}                 \\
\cmidrule{2-9}
                       & MRE$\pm$STD $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     & MRE$\pm$STD  $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     \\ 
\cmidrule{3-5} \cmidrule{7-9}
                       & (mm)       & 1mm   & 2mm   & 4mm   & (mm)       & 1mm   & 2mm   & 4mm    \\
\midrule
U-Net$^*$                  &1.59$\pm$4.91 	& 84.97 	& 91.62 	& \underline{94.51} & 1.08$\pm$4.84 	& 92.20 	& 95.66 	& 97.11 \\
Yao et al.$^*$             & 3.42$\pm$8.99 & 78.32 & 83.53 & 86.42 & 2.28$\pm$12.65 & 81.79 & 87.57 & 94.22 \\
McCouat et al.$^*$         & 1.46$\pm$3.35 & 82.66 & 91.62 & 93.64 & 1.06$\pm$3.07 & 88.73 & 96.53 & 97.98 \\
Zhu et al.$^*$             &\underline{1.29}$\pm$\underline{3.19} & \underline{86.13} & \underline{92.20} & \underline{94.51} & \underline{0.93}$\pm$\underline{2.85} & \underline{93.06} & \underline{97.69} & \underline{98.27} \\
OFELIA (Ours)          & \textbf{0.95}$\pm$\textbf{2.02}   & \textbf{90.17} & \textbf{95.38} & \textbf{96.82} &\textbf{ 0.71}$\pm$\textbf{1.75}   & \textbf{95.66} & \textbf{98.27} & \textbf{99.42}  \\
\bottomrule
\multicolumn{9}{c}{\textbf{Test-DSA Subset}}                                                                                         \\ \midrule
\multirow{3}{*}{\textbf{Model}} & \multicolumn{4}{c}{\textbf{RF Catheter}}                & \multicolumn{4}{c}{\textbf{CS Catheter}}                 \\
\cmidrule{2-9}
                       & MRE$\pm$STD $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}    & MRE$\pm$STD $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     \\ 
\cmidrule{3-5} \cmidrule{7-9}
                       & (mm)     & 1mm   & 2mm   & 4mm   & (mm)    & 1mm   & 2mm   & 4mm    \\
\midrule
U-Net$^*$                  & 3.86$\pm$9.09 & 77.36 & 79.25 & 83.02 & 0.86$\pm$1.20 & 83.02 & \underline{94.34} & \underline{96.23} \\
Yao et al.$^*$             & 6.27$\pm$12.51 & 64.15 & 69.81 & 75.47 & 5.10$\pm$10.53 & 66.04 & 71.70 & 75.47 \\
McCouat et al.$^*$         & 2.65$\pm$6.88 & 81.13 & 84.91 & \underline{88.68} & 0.66$\pm$0.28 & 88.68 & \underline{94.34} & \textbf{100.00} \\
Zhu et al.$^*$            & \underline{2.10}$\pm$\underline{5.29} & \underline{83.02} & \underline{86.79} & \underline{88.68} & \textbf{0.52}$\pm$\underline{0.27} & \underline{94.34} & \textbf{100.00} & \textbf{100.00} \\
OFELIA (Ours)          & \textbf{1.52}$\pm$\textbf{3.30}       & \textbf{86.79} & \textbf{88.68} & \textbf{94.34} & \underline{0.64}$\pm$\textbf{0.19}      & \textbf{96.23} & \textbf{100.00} & \textbf{100.00} \\
\toprule
\multicolumn{9}{c}{\textbf{Test-OBS Subset}}                                                                                         \\ \midrule
\multirow{3}{*}{\textbf{Model}} & \multicolumn{4}{c}{\textbf{RF Catheter}}                & \multicolumn{4}{c}{\textbf{CS Catheter}}                 \\
\cmidrule{2-9}
                       & MRE$\pm$STD $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     & MRE$\pm$STD $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     \\ 
\cmidrule{3-5} \cmidrule{7-9}
                       & (mm)    & 1mm   & 2mm   & 4mm   & (mm)      & 1mm   & 2mm   & 4mm    \\
\midrule
U-Net$^*$                  & 2.85$\pm$6.42 & 73.10 & 81.38 & 84.83 & 1.28$\pm$2.87 & 78.62 & 88.28 & 92.41 \\
Yao et al.$^*$             & 4.43$\pm$10.09 & 71.72 & 78.62 & 81.38 & 3.35$\pm$18.33 & 74.48 & 82.07 & 91.72 \\
McCouat et al.$^*$        & 1.82$\pm$4.35 & 82.07 & 87.59 & 90.34 & \underline{0.85}$\pm$\textbf{1.15} & 86.21 & 91.03 & 95.17 \\
Zhu et al.$^*$             & \underline{1.80}$\pm$\underline{4.29} & \underline{83.45} & \underline{89.66} & \underline{91.03} & 1.27$\pm$3.75 & \underline{88.97} & \underline{93.79} & \underline{96.55} \\
OFELIA (Ours)          & \textbf{1.58}$\pm$\textbf{2.82}    & \textbf{86.21} & \textbf{91.03} & \textbf{93.10} & \textbf{0.73}$\pm$\underline{1.53}   & \textbf{91.72} &\textbf{ 95.86} & \textbf{97.24}  \\
\bottomrule
\end{tabular}
\begin{tablenotes}
    \item[$^*$] Implemented with the official code.
\end{tablenotes}
\end{threeparttable}
}
\vspace{-10pt}
\end{table}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
%%
%% Abalation
%% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\noindent \textbf{Abalation Study}\\
To evaluate the efficiency of our proposed method, we conduct several ablation studies and present the results below and in the Appendix.

\noindent\underline{RAFT trained on Flying-Catheter.}
We use four pre-trained RAFT and FlyingCath RAFT, to predict the optical flow map, and the result is shown in Fig.~\ref{figs:raft_outputcompare}. From Fig.~\ref{figs:raft_outputcompare} we can find that the prediction of FlyingCath RAFT contains more spatial information of the electrodes, and the boundary is clearer, which brings a strong prior knowledge for landmark detection.

% Why finetune RAFT
\begin{figure}[htb]
  \centering
  \includegraphics[scale=0.4]{figs/flow_comp.png}
  \caption{Estimated optical flow map of the same frame of four RAFTs. RAFT trained on Flying-Catheter presents clearer delineation of the catheters and electrodes.}
  \label{figs:raft_outputcompare}
  \vspace{-15pt}
\end{figure}

\noindent\underline{The introduction of optical flow.} 
The proposed OFELIA takes frame $X_t$ and the corresponding optical flow $F_{t\rightarrow t+1}$ as input.
Here we replace $F_{t\rightarrow t+1}$ with (1) The segmentation component of $X_t$ with highest probability from Segment Anything Model (SAM~\citep{sam2023}) without prompt; (2) The subsequent frame $X_{t+1}$; (3) Estimated optical flow map using RAFT trained on natural dataset $F_{t\rightarrow t+1}^s$; (4) OFELIA without extra information.
The result in Table~\ref{tab:ablation} illustrates that, although aggregating extra information can improve the utility of landmark detection, the usage of optical flow tends to be more efficient.
Besides, using optical flow maps generated by Flying-Cath RAFT exhibits better performance than the original RAFT model, which also proves the necessity of fine-tuning RAFT on the constructed Dataset.

\begin{table}[]
\centering
\caption{Abalation on the additional information. \textbf{Best} and \underline{Second Best} are highlighted.}
\label{tab:ablation}
\resizebox{0.8\textwidth}{!}{
\begin{tabular}{lcccccccc}
\toprule
\multicolumn{9}{c}{\textbf{Test Dataset}}                                                                                         \\ \midrule
\multirow{3}{*}{\textbf{Information}} & \multicolumn{4}{c}{\textbf{RF Catheter}}                & \multicolumn{4}{c}{\textbf{CS Catheter}}                 \\
\cmidrule{2-9}
                       & MRE$\pm$STD$\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     & MRE$\pm$STD  $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     \\ 
\cmidrule{3-5} \cmidrule{7-9}
                       & (mm)       & 1mm   & 2mm   & 4mm   & (mm)       & 1mm   & 2mm   & 4mm    \\
\midrule
SAM & \underline{1.12}$\pm$\underline{2.57} & \underline{89.88} & \underline{94.51} & 95.09 & \underline{0.82}$\pm$\underline{1.81} & \underline{93.93} & \underline{97.69} & \underline{98.55} \\
Subsequent Frame & 1.32$\pm$3.13 & 85.26 & 93.35 & \underline{95.38} & 1.02$\pm$4.64 & 91.33 & 97.11 & 97.98 \\
Original RAFT & 1.93$\pm$6.25 & 87.57 & 90.17 & 91.62 & 1.22$\pm$2.19 & 89.60 & 94.22 & 97.69 \\
OFELIA (w/o) extra info. & 2.03$\pm$5.00 & 81.79 & 86.42 & 89.60 & 1.38$\pm$4.75 & 80.92 & 89.60 & 95.38 \\
OFELIA (Ours) & \textbf{0.95}$\pm$\textbf{2.02}	&	\textbf{90.17}	&	\textbf{95.38}	&	\textbf{96.82}	&	\textbf{0.71}$\pm$\textbf{1.75}	&		\textbf{95.66}	&	\textbf{98.27}	&	\textbf{99.42}	\\
\bottomrule
\end{tabular}
}
\end{table}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
%%
%% Conclusion and Future Work
%% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
% - 重新总结任务
% - 总结结果效果
% - future work
% - 电极标注工作量大，可探索更多oneshot或fewshot的工作
% - 如果有更清晰的可分割出管身的网络会有更多作用
% - 或者有更清晰的X光影像
\section{Conclusion and Future Work}
Accurate and efficient electrode detection in real-time fluoroscopy holds a paramount significance. 
In this work, we propose OFELIA, which introduces optical flow features to the pipeline for precise electrode localization in X-ray series.
To improve the model's utility and generalizability, we propose a saturation-based optical flow dataset construction algorithm and fine-tune the optical flow estimator on the synthetic dataset.
The results on the test set and two CCA subsets illustrate the efficiency of our proposed OFELIA compared with several SOTA methods.
It's worth noting that this approach may not be limited solely to catheter ablation but can be generalized to other tasks such as motion object detection. 
In terms of electrode landmark detection tasks, further research could be conducted on the usage of different combinations of loss functions, and the exploration of one-shot or few-shot methods to alleviate the burden of electrode annotation.


% \bibliographystyle{alpha}
% \bibliography{bibliography}
\bibliography{midl24_85}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newpage
\appendix
\section{Visualization of Samples}
\subsection{Visualization of Electrode Landmark Order}
\begin{figure}[h]
  \centering
  \includegraphics[scale=0.4]{figs/rebuttal_cath.png}
  \caption{Visualization of Electrode Landmark Order.}
  \label{figs:rebuttal_cathorder}
\vspace{-5pt}
\end{figure}
\subsection{Samples from Test-DSA Subset}
\vspace{-5pt}
\begin{figure}[h]
  \centering
  \includegraphics[scale=0.45]{figs/res_dsa_subset.png}
  \caption{Qualitative results on the Test-DSA Subset. The ground truth and predicted landmark of CS Catheter are in \textcolor{blue}{Blue} and \textcolor{yellow}{Yellow}. The ground truth and predicted landmark of RF Catheter are in \textcolor{green}{Green} and \textcolor{red}{Red}.}
  \label{figs:rebuttal_dsa}
\end{figure}
\vspace{-5pt}
\newpage
\subsection{Samples from Test-OBS Subset}
\begin{figure}[htb]
  \centering
  \includegraphics[scale=0.45]{figs/res_obs_subset.png}
  \caption{Qualitative results on the Test-OBS Subset. The ground truth and predicted landmark of CS Catheter are in \textcolor{blue}{Blue} and \textcolor{yellow}{Yellow}. The ground truth and predicted landmark of RF Catheter are in \textcolor{green}{Green} and \textcolor{red}{Red}.}
  \label{figs:rebuttal_obs}
  \vspace{-10pt}
\end{figure}

\subsection{Samples with Part of Landmarks}
\begin{figure}[h]
  \centering
  \includegraphics[scale=0.4]{figs/rebuttal_singlecath.png}
  \caption{Qualitative results of single catheter cases. The ground truth and predicted landmark of CS Catheter are in \textcolor{blue}{Blue} and \textcolor{yellow}{Yellow}. The ground truth and predicted landmark of RF Catheter are in \textcolor{green}{Green} and \textcolor{red}{Red}.}
  \label{figs:rebuttal_singlecath}
  \vspace{-10pt}
\end{figure}
In our study, the highest value in the heatmap can be used to determine whether the predicted landmark is reliable. As shown in Fig~\ref{figs:rebuttal_singlecath}(a), if the maximum values of the heatmaps corresponding to all electrodes of the CS catheter are less than the conventional threshold, we will conclude that there is no CS catheter in the current frame. This threshold, determined through our statistical analysis, is 30(before normalization).
% 可以用heatmap中最大值区分预测出的landmark是否可信
\section{Visualization of Failure Detection Cases}
\begin{figure}[h]
  \centering
  \includegraphics[scale=0.45]{figs/rebuttal_failure.png}
  \caption{Qualitative results of failure detection cases. The ground truth and predicted landmark of CS Catheter are in \textcolor{blue}{Blue} and \textcolor{yellow}{Yellow}. The ground truth and predicted landmark of RF Catheter are in \textcolor{green}{Green} and \textcolor{red}{Red}.}
  \label{figs:rebuttal_failurecases}
  \vspace{-10pt}
\end{figure}
\newpage
\section{Statistical Significance Testing of the Results}
\subsection{Results on Test Set}
\begin{figure}[h]
  \centering
  \includegraphics[width=0.6\textwidth]{figs/STATICS_testset-2.png}
  \caption{Statical analysis results of test set.}
  \label{figs:STATICS_testset}
  \vspace{-10pt}
\end{figure}
\subsection{Results on Test-DSA Subset}
\begin{figure}[h]
  \centering
  \includegraphics[width=0.6\textwidth]{figs/STATICS_DSAset-2.png}
  \caption{Statical analysis results of Test-DSA Subset.}
  \label{figs:STATICS_DSAset}
  \vspace{-10pt}
\end{figure}
\newpage
\subsection{Results on Test-OBS Subset}
\begin{figure}[h]
  \centering
  \includegraphics[width=0.6\textwidth]{figs/STATICS_OBSset-2.png}
  \caption{Statical analysis results of Test-OBS Subset.}
  \label{figs:STATICS_OBSset}
  \vspace{-10pt}
\end{figure}
\section{Extra Ablation Study}
\subsection{Ablation Study on $\alpha$}
\vspace{-10pt}
\begin{table}[h]
\centering
\caption{Ablation results on $\alpha$. \textbf{Best} and \underline{Second Best} are highlighted.}
\label{tab:ablation_alpha}
\resizebox{0.7\textwidth}{!}{
\begin{tabular}{lcccccccc}
\toprule
\multicolumn{9}{c}{\textbf{Test Dataset}}                                                \\ \midrule
\multirow{3}{*}{\textbf{Model}} & \multicolumn{4}{c}{\textbf{RF Catheter}}                & \multicolumn{4}{c}{\textbf{CS Catheter}}                 \\
\cmidrule{2-9}
                       & MRE$\pm$STD  $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     & MRE$\pm$STD   $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     \\ 
\cmidrule{3-5} \cmidrule{7-9}
                       & (mm)       & 1mm   & 2mm   & 4mm   & (mm)       & 1mm   & 2mm   & 4mm    \\
\midrule
$\alpha=0$           & 1.93$\pm$6.25 & 69.65 & 84.68 & 91.62 & 1.42$\pm$7.19 & 88.15 & 93.35 & 97.69 \\
$\alpha=0.25$        & \underline{1.75}$\pm$ \underline{5.99}  & \underline{87.57} & \underline{90.17} & \underline{94.80} & \underline{1.22}$\pm$\underline{4.56} & \underline{89.88} & \underline{94.22} & \underline{98.27} \\
$\alpha=0.5$(Ours)   & \textbf{0.95}$\pm$\textbf{2.02}	&	\textbf{90.17}	&	\textbf{95.38}	&	\textbf{96.82}	&	\textbf{0.71}$\pm$\textbf{1.75}	&		\textbf{95.66}	&	\textbf{98.27}	&	\textbf{99.42}	\\
$\alpha=0.75$        & 3.60$\pm$7.57 & 58.38 & 71.39 & 84.39 & 3.04$\pm$6.10 & 66.18 & 71.10 & 81.21 \\
\bottomrule
\end{tabular}
}
\end{table}
\subsection{Ablation Study on The Number of Frames Used}
\begin{table}[h]
\centering
\caption{Ablation results on The Number of Frames Used for RAFT fine-tune Regarding $\alpha$.}
\label{tab:ablation_alphanumber}
\resizebox{0.4\textwidth}{!}{
\begin{tabular}{lllll}
\toprule
$\alpha$ & 0 & 0.25 & 0.5 & 0.75 \\
\midrule
Number & 14,768 & 10,345 & 6,212 & 315\\
\bottomrule
\end{tabular}}
\end{table}
\newpage
\subsection{Ablation Study on the Four Optical Flow Estimators}
\vspace{-10pt}
\begin{table}[h]
\centering
\caption{Ablation results on the Four Optical Flow Estimators. \textbf{Best} and \underline{Second Best} are highlighted.}
\label{tab:ablation_flow}
\resizebox{0.7\textwidth}{!}{
\begin{tabular}{lcccccccc}
\toprule
\multicolumn{9}{c}{\textbf{Test Dataset}}                                                \\ \midrule
\multirow{3}{*}{\textbf{Model}} & \multicolumn{4}{c}{\textbf{RF Catheter}}                & \multicolumn{4}{c}{\textbf{CS Catheter}}                 \\
\cmidrule{2-9}
                       & MRE$\pm$STD  $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     & MRE$\pm$STD   $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     \\ 
\cmidrule{3-5} \cmidrule{7-9}
                       & (mm)       & 1mm   & 2mm   & 4mm   & (mm)       & 1mm   & 2mm   & 4mm    \\
\midrule
RAFT Chairs             & 2.22$\pm$\underline{5.41} & 81.79 & 85.55 & 87.57 & \underline{0.90}$\pm$\underline{2.20} & 86.99 & \underline{93.35} & 96.24 \\
RAFT kitti              & 2.40$\pm$7.08 & 75.72 & 88.73 & \underline{91.62} & 2.68$\pm$6.34 & 77.46 & 90.17 & 93.93 \\
RAFT things             & 2.66$\pm$9.37 & 78.61 & 87.57 & 89.60 & 2.21$\pm$8.24 & 77.75 & 85.84 & 93.35 \\
RAFT sintel            & \underline{1.93}$\pm$6.25 & \underline{87.57} & \underline{90.17} & \underline{91.62} & 1.42$\pm$7.19 & \underline{88.15} & \underline{93.35} & \underline{97.69} \\
OFELIA (Ours)          & \textbf{0.95}$\pm$\textbf{2.02}	&	\textbf{90.17}	&	\textbf{95.38}	&	\textbf{96.82}	&	\textbf{0.71}$\pm$\textbf{1.75}	&		\textbf{95.66}	&	\textbf{98.27}	&	\textbf{99.42}	\\
\bottomrule
\end{tabular}
}
\end{table}
\subsection{Ablation Study on Using Longer Frame Stacks}
\vspace{-10pt}
\begin{table}[h]
\centering
\caption{Ablation results on Using Longer Frame Stacks. Frame123$\rightarrow$2 means that we take frame $X_{t-1}, X_t$ and $X_{t+1}$ as input to predict landmark positions on frame $X_t$. The meanings of Frame12$\rightarrow$1, Frame1234$\rightarrow$2 and Frame12345$\rightarrow$3 follow the similar manner. \textbf{Best} and \underline{Second Best} are highlighted.}
\label{tab:ablation_framestack}
\resizebox{0.7\textwidth}{!}{
\begin{tabular}{lcccccccc}
\toprule
\multicolumn{9}{c}{\textbf{Test Dataset}}                                                \\ \midrule
\multirow{3}{*}{\textbf{Model}} & \multicolumn{4}{c}{\textbf{RF Catheter}}                & \multicolumn{4}{c}{\textbf{CS Catheter}}                 \\
\cmidrule{2-9}
                       & MRE$\pm$STD  $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     & MRE$\pm$STD   $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     \\ 
\cmidrule{3-5} \cmidrule{7-9}
                       & (mm)       & 1mm   & 2mm   & 4mm   & (mm)       & 1mm   & 2mm   & 4mm    \\
\midrule
Frame12$\rightarrow$  1           & \underline{2.67}$\pm$\underline{4.39} & \underline{81.21} & \underline{86.13} & \underline{89.31} & \underline{1.88}$\pm$\underline{2.96} & \underline{86.42} & \underline{92.49} & \underline{96.82} \\
Frame123$\rightarrow$  2              & 4.24$\pm$8.84 & 75.72 & 84.68 & 88.73 & 2.21$\pm$3.06 & 69.08 & 78.90 & 87.28 \\
Frame1234$\rightarrow$ 2            & 4.33$\pm$5.80 & 68.79 & 74.86 & 79.48 & 3.06$\pm$5.54 & 77.46 & 85.55 & 91.62 \\
Frame12345$\rightarrow$  3            & 3.15$\pm$4.60 & 70.52 & 82.08 & 87.86 & 2.50$\pm$5.13 & 71.39 & 76.59 & 87.28 \\
OFELIA (Ours)          & \textbf{0.95}$\pm$\textbf{2.02}	&	\textbf{90.17}	&	\textbf{95.38}	&	\textbf{96.82}	&	\textbf{0.71}$\pm$\textbf{1.75}	&		\textbf{95.66}	&	\textbf{98.27}	&	\textbf{99.42}	\\
\bottomrule
\end{tabular}
}
\end{table}

\subsection{Ablation Study on Using a Held-Out Center}
\vspace{-10pt}
\begin{table}[h]
\centering
\caption{Ablation results on Using a Held-Out Center. To enhance the evaluation, we select one center from our multi-center dataset as a held-out center. The data from this center is not used for training but exclusively used for testing purposes. Thus, we have established a new test set, which we have named as the Test-Plus Dataset, containing 360 sequences.\textbf{Best} and \underline{Second Best} are highlighted.}
\label{tab:ablation_heldout}
\resizebox{0.7\textwidth}{!}{
\begin{tabular}{lcccccccc}
\toprule
\multicolumn{9}{c}{\textbf{Test-Plus Dataset}}                                                \\ \midrule
\multirow{3}{*}{\textbf{Model}} & \multicolumn{4}{c}{\textbf{RF Catheter}}                & \multicolumn{4}{c}{\textbf{CS Catheter}}                 \\
\cmidrule{2-9}
                       & MRE$\pm$STD  $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     & MRE$\pm$STD   $\downarrow$        & \multicolumn{3}{c}{SDR (\%)$\uparrow$}     \\ 
\cmidrule{3-5} \cmidrule{7-9}
                       & (mm)       & 1mm   & 2mm   & 4mm   & (mm)       & 1mm   & 2mm   & 4mm    \\
\midrule
U-Net$^*$                  & 2.86$\pm$6.63 & 71.67 & 80.00 & 86.67 & 1.60$\pm$4.94 & 78.61 & 87.22 & 93.33 \\
Yao et al.$^*$             & 5.95$\pm$11.37 & 61.67 & 67.22 & 74.17 & 1.16$\pm$\underline{1.53} & 70.83 & 83.89 & 96.67 \\
McCouat et al.$^*$         & 2.35$\pm$11.25 & \underline{86.67} & \underline{91.67} & \underline{93.33} & 0.91$\pm$5.68 & \underline{91.67} & \underline{95.83} & \underline{97.50} \\
Zhu et al.$^*$             & \underline{1.45}$\pm$\underline{3.55} & 83.61 & 88.61 & 91.94 & \underline{0.83}$\pm$\textbf{1.46} & 87.22 & 93.33 & 95.28 \\
OFELIA (Ours)          & \textbf{1.18}$\pm$\textbf{2.82} & \textbf{91.67} & \textbf{93.33} & \textbf{95.00} & \textbf{0.73}$\pm$\underline{1.53} & \textbf{92.50} & \textbf{96.67} & \textbf{98.33} \\
\bottomrule
\end{tabular}
}
\end{table}


\vspace{-5pt}
\newpage
\section{Visualization of Optical Flows with Different $\alpha$ and Corresponding Catheter Segmentation Maps}
\vspace{-5pt}
\begin{figure}[htb]
  \centering
  \includegraphics[scale=0.4]{figs/rebuttal_flowdice.png}
  \caption{Visualization of Optical Flows with Different $\alpha$. The segmentation mask is generated by SAM~\citep{sam2023} with landmark prompt input.}
  \label{figs:rebuttal_flowdice}
  \vspace{-5pt}
\end{figure}
In our study, we don't have the segmentation masks of catheters, but to better illustrate the $\alpha$ issure, we conduct the ablation study of visualization of optical flows with different $\alpha$ and corresponding catheter segmentation generated by SAM~\citep{sam2023}. As it shown in Fig~\ref{figs:rebuttal_flowdice}, the optical flow maps selected by $\alpha\ge0.5$ show similarities with the segmentation maps (Dice $\ge$ 0.5).
% \section{Visualiztion of Estimated Optical FLows with Catheter Segmentation Maps}
% Manually annotate some, and add Dice on the flow maps?

\section{Details about Dataset Constructions}
Two clinical-challenging (CCA) subsets are manually selected from entire test set by clinical experts, like ConTrack~\citep{ConTrack}, they split their private test dataset into several types according to different scenarios. In our study, the Test-DSA Subset encompasses specific clinical scenarios: X-rays under angiography, where the injection of contrast agents leads to non-uniformed dark shadows moving with the bloodstream in the X-rays, potentially affecting the field of vision. The Test-OBS Subset includes situations in X-rays where catheters obscure each other or are obscured by external wires, patches, etc.. Both of the two subsets present augmented complexity for the detection of catheter electrodes.
\section{Details about RAFT Fine-tuning}
When tackling the issue of video landmark detection, to better observe video quality, we store the optical flow maps in RGB video format, following the conversion process provided by RAFT~\citep{teed2020raft}. During our analysis, we observe that the optical flow map, especially its saturation channel, provides shape information of catheter electrode to some extent. This saturation map was derived by converting the optical flow images in RGB into the HSV format. Our analysis further reveals that sufficient shape information from optical flow images was accessible when $\alpha$ is greater than 0.5. However, an excessively high $\alpha$ value led to a drastic reduction in the amount of data available for finetuning($\alpha$=0.5 6212 frames, $\alpha$=0.75 315 frames), while too low an $\alpha$ would introduce noisy data. Therefore, we establish the flying catheter dataset with 0.5 as the threshold value and perform finetuning based on \textit{RAFT sintel} model because the \textit{RAFT sintel} model reveals a higher mean SF compared with other three raft models.
\end{document}
