\section{Introduction}
NailFold Capillaroscopy (NFC) is a non-invasive imaging modality used clinically to assess the health of the microcirculatory system by visualizing capillary structures near the surface of the skin, particularly at the human finger nailfold area~\cite{hahn1998hemodynamics,etehad2015nailfold,10.1007/978-3-319-10404-1_82}. This imaging technique provides critical insights into capillary morphology, making it indispensable in diagnosing and monitoring a range of autoimmune and vascular conditions, including Systemic Sclerosis (SSc)~\cite{lambova2019nailfold,ruaro2018advances} and Raynaud's phenomenon~\cite{ruaro2019innovations,pauling2019raynaud}. Moreover, emerging research indicates that NFC abnormalities might correlate closely with metabolic disorders such as diabetes~\cite{okabe2024relationship,shah2023nailfold}, thus further extending its diagnostic relevance.

\begin{figure*}[htbp]
\centerline{\includegraphics[width=\textwidth]{images/mtl-comparisons.PNG}}
\caption{Traditional method for NFC \textbf{(A)} requires significant clinician intervention. While existing deep learning approaches \textbf{(B)} allow clinicians to process images through multiple separate models, our proposed multi-task learning model \textbf{(C)} integrates key tasks into a unified model that produces comprehensive capillary image analysis in a single operation. Note: the A\textcircled{1} image is obtained from \cite{Jee00416}.}
\label{fig:mtl-comparisons}
\end{figure*}

In traditional clinics, NFC examinations involve capturing microscopic images at approximately $\times 200$ magnification, followed by detailed manual analysis. Specialists visually inspect these images for morphological abnormalities, delineate capillary boundaries, classify capillary morphology, and measure parameters such as apical, arterial, and venous diameters. They then compare these morphological and quantitative findings to reference criteria or rely on clinical experience to identify abnormalities indicative of disease (\autoref{fig:mtl-comparisons} \textbf{(A)}). While effective, this manual assessment is inherently \textit{subjective, labor-intensive, time-consuming}, and highly \textit{dependent} on clinician expertise, potentially leading to diagnostic variability, inconsistent image interpretation, and delayed or inaccurate clinical decisions~\cite{berks_automated_2018}. Additionally, the manual approach significantly strains clinical resources, restricting NFC's broader accessibility and clinical adoption.

The application of machine learning in nailfold capillaroscopy is advancing accurate automated diagnosis. A common starting point in this process is segmentation, which outlines targeted capillaries in input images. While not strictly necessary for morphological estimation or parameter calculation, segmentation enhances deep learning pipelines by improving capillary localization~\cite{bharathi2023deep}. Neural networks like U-Net~\cite{ronneberger2015u}, Mask-RCNN~\cite{he2017mask} and their variants~\cite{LIU2020104011,qin2020u2} are widely used in NFC segmentation.

Beyond segmentation, capillary classification also plays a crucial role in identifying different capillary types, such as normal, abnormal, or those with conjunctions or anastomoses~\cite{gracia2022challenge,zhao2024comprehensive}. CNN-based approaches have proven effective for this task. Meanwhile, the quantification of capillary parameters, including density, loop width, and arterial/venous length, is essential for diagnosing diseases like systemic sclerosis, lupus, and rheumatoid arthritis. Some studies favor traditional computer vision or mathematical methods for this analysis~\cite{kim2020automated,gracia2022challenge}.

Recent viral NFC studies also favor keypoint-based quantitative analysis. For example, Tello et al.~\cite{gracia2022challenge} employed stacked DenseNet for two-stage capillary parameter estimation, achieving 88\% accuracy at a confidence threshold of 0.50. Zhao et al.~\cite{zhao2024comprehensive} combined Mask-RCNN with a matching algorithm, reporting an apical diameter MAE of 1.674 pixels and RMSE of 2.023 pixels. Integrating keypoint estimation into NFC analysis enhances accuracy and efficiency.

Despite successes in individual NFC tasks, existing methods fail to \textbf{simultaneously} predict multiple tasks. Similar medical imaging studies, such as retinal fundus~\cite{yi2023multi} and skin lesions~\cite{thwin2024enhanced}, have demonstrated strong connections between related tasks like classification and segmentation. General imaging applications, including human pose estimation, also indicate a strong link between keypoint estimation and segmentation~\cite{Geng_2021_CVPR}.

To bridge the gap in existing NFC research, we introduce a novel Multi-Task Learning (\modelName) strategy, depicted in~\autoref{fig:mtl-comparisons}\textbf{(C)}, that integrates capillary \textit{semantic segmentation}, \textit{keypoint detection}, and \textit{classification} into a single unified model. Leveraging a Multiscale Vision Transformer (MViT) backbone with a Feature Pyramid Network (FPN), our model uses a specialized loss function to optimize task predictions simultaneously. Evaluations on the ANFC dataset~\cite{zhao2024comprehensive} demonstrate balanced performance improvements across tasks. Ablation studies further confirm precision gains from task unification, achieving sub-pixel accuracy ($<1$ pixel error) in downstream capillary parameter estimations.

Our contributions are summarized as follows: \textbf{1)} We propose the first reliable multi-task learning model for NFC image tasks, simultaneously performing precise segmentation, classification, and keypoint estimation. \textbf{2)} We introduce the \textbf{MViT-FPN} model, which outperforms existing approaches in NFC imaging tasks. \textbf{3)} Through extensive experiments, \modelName{} demonstrates superior performance in capillary entity estimation and parameter computation, especially when keypoint detection is jointly learned.
