
% This is a modified version of Springer's LNCS template suitable for anonymized MICCAI 2025 main conference submissions. 
% Original file: samplepaper.tex, a sample chapter demonstrating the LLNCS macro package for Springer Computer Science proceedings; Version 2.21 of 2022/01/12

\documentclass[runningheads]{llncs}
%
\usepackage[T1]{fontenc}
% T1 fonts will be used to generate the final print and online PDFs,
% so please use T1 fonts in your manuscript whenever possible.
% Other font encodings may result in incorrect characters.
%
\usepackage{graphicx,verbatim}
\usepackage{amsmath}
\usepackage{booktabs}
\usepackage{hyperref}
% Used for displaying a sample figure. If possible, figure files should
% be included in EPS format.
%
% If you use the hyperref package, please uncomment the following two lines
% to display URLs in blue roman font according to Springer's eBook style:
%\usepackage{color}
%\renewcommand\UrlFont{\color{blue}\rmfamily}
%\urlstyle{rm}
%
\begin{document}
%
\title{Synthetic Data Generation for Automated Hair Instance
    Segmentation in Trichoscopy}
\titlerunning{Synthetic Data Generation Hair Instance
    Segmentation in Trichoscopy}
% If the paper title is too long for the running head, you can set
% an abbreviated paper title here
%
\begin{comment}  %% Removed for anonymized MICCAI submission
\author{First Author\inst{1}\orcidID{0000-1111-2222-3333} \and
Second Author\inst{2,3}\orcidID{1111-2222-3333-4444} \and
Third Author\inst{3}\orcidID{2222--3333-4444-5555}}
%
\authorrunning{F. Author et al.}
% First names are abbreviated in the running head.
% If there are more than two authors, 'et al.' is used.
%
\institute{Princeton University, Princeton NJ 08544, USA \and
Springer Heidelberg, Tiergartenstr. 17, 69121 Heidelberg, Germany
\email{lncs@springer.com}\\
\url{http://www.springer.com/gp/computer-science/lncs} \and
ABC Institute, Rupert-Karls-University Heidelberg, Heidelberg, Germany\\
\email{\{abc,lncs\}@uni-heidelberg.de}}

\end{comment}

\author{
Mohamed Ali Ben Youssef\inst{1} \and
Khoi Tran Dang\inst{1} \and
Duc Thang Nguyen\inst{1} \and
Awatef Kelati\inst{2, 3, 4} \and
Hang Nguyen\inst{1}
}

\authorrunning{M.A. Ben Youssef et al.}

\institute{
Belle.ai \and
Dermatology Department, University Hospital Cheikh Khalifa, Morocco\and Dermatology Department, University Hospital Mohammed VI, Morocco \and Faculty of Medicine, Mohammed VI University of Health and Sciences (UM6SS), Casablanca, Morocco
}
  
\maketitle              % typeset the header of the contribution

\begin{abstract}

Quantitative trichoscopy requires reliable localization of individual hair shafts, but dense per-hair annotation is costly, especially for thin, overlapping structures. We propose a scale-aware synthetic compositing pipeline for hair-shaft instance segmentation. Reusable hair crops are placed onto inpainted hair-free scalp backgrounds, generating 20,000 synthetic images with exact per-hair instance masks. A follicle-diameter calibration factor normalizes scale during synthesis and inference across devices and magnification settings. Evaluated on 365 real trichoscopy images, a model trained exclusively on synthetic data achieves Box mAP50 = 0.52 and Jaccard = 0.54. A union of complementary inference variants improves Box mAP50 while matching the best single-variant Jaccard. Compared with a semantic segmentation baseline, our method reduces count MAE by 43\% from 24.2 to 13.7 hairs and improves counting correlation from 0.87 to 0.93. These results suggest that scale-aware synthetic compositing can reduce annotation cost while enabling effective per-shaft localization.
\keywords{Trichoscopy \and Instance segmentation \and
Synthetic data \and Inpainting \and Alopecia}
\end{abstract}

%
%
%
\section{Introduction}

Trichoscopy and videodermoscopy are widely used as non-invasive techniques for
evaluating hair and scalp disorders.
By magnifying the scalp surface, they allow clinicians to
assess hair shaft diameter, hair density, follicular-unit structure, shaft
morphology, and perifollicular patterns that are useful in diagnosing and
monitoring conditions such as androgenetic alopecia, alopecia areata, and
cicatricial alopecias~\cite{rudnicka2012atlas,zengarini2026trichoscopy}.
These quantitative and morphological cues are increasingly central to
treatment monitoring and objective disease assessment, yet in routine
practice, many measurements still depend on manual inspection or
device-specific software whose datasets and validation protocols are not
always publicly available.

Recent computational trichoscopy studies have explored automated hair density estimation, shaft-diameter analysis, follicular-unit quantification, alopecia severity prediction, and disease classification from scalp images~\cite{gao2022hair,kim2022automated,zengarini2026trichoscopy,chang2020scalpeye,urban2021scalp,lee2020alopecia}. These advances demonstrate the growing potential of image analysis for objective and reproducible hair and scalp assessment. Most existing approaches, however, rely on semantic segmentation or follicle-level annotations that separate hair from scalp without explicitly identifying individual hair shafts.

While semantic segmentation is sufficient for estimating overall hair coverage, it is inherently limited for quantitative trichoscopy. Clinically relevant measurements such as hair count, shaft length, diameter distribution, orientation, and growth pattern require the identification of individual hairs rather than a single foreground mask. In principle, connected-component analysis, Hough transformations can be applied to semantic segmentation outputs to approximate individual shafts~\cite{shih2014hair,kim2024scalpvision}, but this strategy is highly sensitive to threshold selection and image characteristics. Trichoscopy images exhibit substantial variability in hair colour, shaft thickness, scalp pigmentation, acquisition devices, and magnification settings, making it difficult for a fixed postprocessing pipeline to generalize across datasets. Furthermore, overlapping and crossing hairs frequently merge into a single connected region, preventing reliable separation of individual shafts.

These limitations motivate the use of instance segmentation, which explicitly separates each hair shaft and directly provides the object-level representation required for quantitative analysis. However, training modern instance segmentation models~\cite{he2017mask,ultralytics_yolo26} requires dense per-instance annotations, and generating such labels is particularly challenging in trichoscopy. Hair shafts are often only a few pixels wide, extend across large portions of the image, and frequently overlap or exhibit weak contrast against the scalp. As a result, manually outlining every visible hair in an image is extremely time-consuming and difficult to scale. Consequently, publicly available trichoscopy datasets containing dense per-hair instance masks remain scarce, creating a major obstacle for the development and evaluation of hair-shaft instance segmentation methods.



Synthetic data generation offers a promising solution to this annotation bottleneck. By generating images together with automatically derived ground-truth labels, synthetic datasets can provide dense supervision at a fraction of the cost of manual annotation. Synthetic data has been widely adopted in medical imaging through GAN-based augmentation~\cite{goodfellow2014gan,frid2018gan,shin2018medical,yi2019generative}, copy-paste compositing~\cite{dwibedi2017cut,ghiasi2021copypaste}, and domain-randomization approaches~\cite{tobin2017domain}. In dermatology and trichoscopy, where relevant structures are often small, repetitive, and expensive to annotate, such strategies are especially valuable.

Despite this progress, existing synthetic-data research in trichoscopy has focused almost exclusively on semantic segmentation tasks, typically generating binary hair-versus-scalp labels~\cite{shih2014hair,kim2024scalpvision}. To the best of our knowledge, little attention has been given to generating synthetic datasets with dense per-hair instance annotations suitable for training instance segmentation models. Creating such datasets presents additional challenges beyond those encountered in semantic segmentation. In compositing-based synthesis, for example, hair fragments extracted from images acquired at different magnifications cannot simply be pasted onto arbitrary backgrounds without introducing physically unrealistic scale relationships. A hair segment originating from a $\times 20$ image and pasted into a $\times 70$ image will appear at an incorrect width relative to the surrounding scalp texture and follicular structures, creating artifacts that may impair model generalization. Addressing these challenges is therefore essential for developing realistic synthetic datasets that can support robust hair-shaft instance segmentation.


In this work, we address both the annotation bottleneck and the
scale-mismatch problem through a scale-aware synthetic compositing
framework for hair instance segmentation in trichoscopy.
We first remove visible hair from real scalp images using an inpainting
model~\cite{suvorov2022lama} to obtain backgrounds that preserve realistic
skin texture, illumination, and follicular appearance.
We then composite a reusable library of individual hair segments onto these
backgrounds. Since the placement is fully controlled, every generated image goes with a per-hair instance masks, with no manual
labelling required.
To ensure scale consistency across devices and magnification settings, we
estimate a follicle-diameter-based scale factor for each image and apply it
both during synthetic image generation and during inference-time
preprocessing.

The main contributions of this work are as follows:
\begin{itemize}
\item We present, to the best of our knowledge, the first synthetic-data generation framework specifically designed for \emph{per-hair instance segmentation} in trichoscopy, addressing the scarcity of densely annotated datasets for this task.

\item We develop a realistic compositing pipeline that combines real hair-shaft instances and real scalp backgrounds, producing synthetic trichoscopy images together with exact per-hair instance masks.

\item We introduce a magnification normalization and scale-adaptation strategy that accounts for variations in imaging devices and acquisition settings, ensuring anatomically consistent hair widths and realistic hair--scalp relationships during synthesis.

\item Our framework substantially reduces annotation effort by eliminating the need for exhaustive per-image hair-shaft labeling while enabling scalable generation of large instance-segmented training datasets.

\item We publicly release a synthetic trichoscopy dataset with dense per-hair instance annotations to support future research on hair analysis and instance segmentation in dermatological imaging.

\end{itemize}



\section{Synthetic hair generation flow}
Figure~\ref{fig:synthetic_hair_flow} presents the overall workflow of the proposed synthetic trichoscopy data generation framework. We first estimate the physical scale of each image using hair-follicle detection, enabling normalization across different devices and magnification settings. This scale information serves two purposes. First, it allows hair instances and scalp backgrounds to be represented in a common physical space during synthetic data generation. Second, it enables scale normalization during model training and inference, reducing variability introduced by heterogeneous acquisition settings and allowing the segmentation model to learn from a controlled range of magnifications.

In parallel, hair segmentation masks are used to guide an inpainting model that removes visible hair shafts and generates realistic hair-free scalp backgrounds. Individual hair instances are then extracted from the annotated images and stored as reusable hair-shaft templates. During synthesis, extracted hair instances are resized according to the estimated scale of the target scalp image and composited onto the inpainted scalp background, ensuring anatomically plausible hair widths and consistent hair-scalp relationships. The resulting synthetic images are accompanied by automatically generated dense per-hair instance segmentation labels. By normalizing image scale throughout both data generation and model deployment, the proposed framework improves robustness across devices and magnification levels while substantially reducing the need for exhaustive manual annotation.


\begin{figure}
    \centering
    \includegraphics[width=0.7\linewidth]{figures/synth hair flow.png}
    \caption{Overview of the proposed scale-aware synthetic trichoscopy generation framework.}
    \label{fig:synthetic_hair_flow}
\end{figure}

\subsection{Hair-free scalp background (Scalp images)}
\label{sec:background}
This stage produces the first asset, a bank of 6,000 hair-free scalp backgrounds used as compositing canvases (Sec.~\ref{sec:synthesis}). To construct these backgrounds, we use a dataset of approximately 6,000 real trichoscopy images collected by Dr. Awatef Kelati during her routine clinical practice. For each image, a semantic hair-segmentation model~\cite{long2015fcn,ronneberger2015unet} and a follicle detector generate a hair mask and a follicle mask, respectively. The union of these masks is dilated using a 9-pixel elliptical kernel to close residual gaps and define the region to be removed. We then train an inpainting model based on the LaMa architecture \cite{suvorov2022lama} to reconstruct the occluded scalp regions. During inference, images are resized to $512\times512$, inpainted by the trained model, and subsequently resized back to their original resolution ( Fig.~\ref{fig:inpainted_example} is one example of inpainted scalp).

% Because the inpainting process preserves the follicular openings and underlying scalp texture, each generated background retains its estimated scalp scale $d_i^{\mathrm{scalp}}$ (Sec.~\ref{sec:scale}). This scale information is later used to determine how extracted hair segments are resized and composited onto the background (Eq.~\ref{eq:scale}), ensuring anatomically consistent synthesis across different magnification settings.

Because the scale ($d_i^{\mathrm{scalp}}$) is estimated before inpainting and retained for each generated background, the inpainted scalp image keeps the corresponding scalp-scale metadata (Sec.~\ref{sec:scale}). This scale information is later used to determine how extracted hair segments are resized and composited onto the background (Eq.~\ref{eq:scale}), ensuring anatomically consistent synthesis across different magnification settings.

\begin{figure}
    \centering
    \includegraphics[width=0.5\linewidth]{figures/case61-MF_id4_review.jpg}
    \caption{Example of inpainted scalp.}
    \label{fig:inpainted_example}
\end{figure}

\subsection{Scale Factor Estimation}
\label{sec:scale}
Follicle openings are a convenient anatomical ruler: their physical size is
stable across patients, so their apparent diameter in pixels is a direct
magnification proxy. For every image, we compute a follicle size $d_i$ (value in pixels) using a follicle detection model. Then the scale is computed as follows: $\sigma_i = d_\text{ref}/d_i$ where $d_\text{ref}$ is the median of $d_i$ for all $i$. This scale sets the segment resize factor
at compositing (Eq.~\ref{eq:scale}), and normalises test images during the inference (Sec.~\ref{sec:training}).



The follicle detector classifies each follicular unit according to the number of emerging hairs (0, 1, 2, 3, or 4+). Because a follicular unit may contain multiple hair shafts originating from the same opening, units with more hairs tend to occupy a larger bounding box than single-hair units. Consequently, the observed box size is not a direct estimate of the diameter of a single follicular opening. To compensate for this effect, we introduce a class-dependent scaling factor $f_c$ that converts the size of box $j$ ($s_j$) into its equivalent single-opening diameter. Furthermore, bounding boxes corresponding to large follicular clusters exhibit greater variability and are therefore less reliable as scale estimates. We account for this uncertainty using a class-specific reliability weight $\rho_c$, yielding the estimation of the follicle size of image $i$ is:

\begin{equation}
    d_\text{i} = \frac{\sum_j \rho_c \, s_j / f_c}{\sum_j \rho_c}
\end{equation}

This formulation allows all detected follicular units to contribute to scale estimation while reducing the influence of large multi-hair clusters.
 

 

\subsection{Hair Segment Library (Individual hair images)}
\label{sec:library}
 To construct the library of reusable hair instances, we selected a subset of 600 images from the 6,000 trichoscopy images available in our collection for manual instance annotation. Rather than requiring annotators to exhaustively segment every visible hair shaft, which would be prohibitively time-consuming, we adopted a targeted annotation strategy focused on high-quality hair exemplars. Annotators were instructed to label only hairs that were fully contained within the image boundaries, sufficiently isolated to enable single-strand extraction, visually sharp without motion blur, and free from color contamination caused by overlapping hairs.

For each strand we store an RGBA crop and binary mask, root and tip endpoints
from skeleton thinning, length, mean and max diameter, and curvature, a
root-to-tip orientation, a placement type (anchored, floating), and the source diameter $d_i^\text{src}$.
These attributes drive compositing (Sec.~\ref{sec:synthesis}): diameter and
length govern scale matching and generator eligibility; the root endpoint and
orientation set the rotation pivot and angle; the placement type, with the tip
endpoint, decides where a strand may be anchored; $d_i^\text{src}$ undoes the
source magnification; and the binary mask becomes the instance label under the
same transform.




 
% ---------------------------------------------------------------
\subsection{Synthetic Image Generation}
\label{sec:synthesis}
 


 
\noindent\textbf{Scale normalization:}
Let $i$ index a scalp background and $d_i^{\text{scalp}}$ denote its
magnification diameter estimated in
(Sec.~\ref{sec:scale}).
Let $j$ index a hair segment selected for compositing in scalp background $i$, and
$d_j^{\text{seg}}$ denote the magnification diameter of the source image
from which that segment was extracted.
The resize factor applied to segment $j$ when placed on scalp $i$ is
\begin{equation}
    s_{ij} = \frac{d_i^{\text{scalp}}}{d_j^{\text{seg}}} \cdot m,
    \label{eq:scale}
\end{equation}
where $d_i^{\text{scalp}}/d_j^{\text{seg}}$ corrects for the zoom-level
difference between the target scalp and the segment source. The multiplier $m$ is to simulate zoom variation within the same scalp. 


\noindent\textbf{Clinical diversity: }
Five generators reproduce distinct clinical presentations
(Fig. \ref{fig:synth_exp}): standard follicle-anchored fields, edge-to-edge
floating shed hairs (diffuse alopecia), close-up stubs (high
magnification), dual-direction growth (crown or parting), and radial fans
(alopecia areata, lichen planopilaris).


\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{figures/synthetic_data_example.png}
    \caption{Synthetic data example (left to right): normal, floating dominant, close-up stub, dual direction, radial fan}
    \label{fig:synth_exp}
\end{figure}
\noindent\textbf{Photometric augmentation: }
After all hairs are composited, a single shared degradation is applied
to the whole image: radial vignetting (ring-light falloff), Gaussian blur
(depth-of-field), additive noise, and JPEG re-encoding.
Applying the degradation globally ensures that hair boundaries receive the
same treatment as the surrounding scalp, leaving no boundary artefact
that a model could exploit as a shortcut.

\section{Model Training}
\label{sec:training}

We train YOLO26-large-seg~\cite{ultralytics_yolo26} for 500 epochs on
20{,}000 synthetic images \footnote{\url{https://github.com/AIpourlapeau/hair_synthetic_dataset}} (90\,/\,10\, train\,/\,val\,)
with Adam optimizer, initial learning rate $10^{-4}$, and cosine annealing to
$10^{-6}$.
% Four modifications address failure modes specific to sub-5-pixel elongated
% objects that are absent in standard benchmarks.

\textbf{Scale-matched geometric augmentation.}
During training, images were first normalized using the scale factor in Eq. \ref{eq:scale}. This normalization enables the model to learn from a consistent physical scale while still observing controlled magnification variations, improving robustness to the diverse imaging conditions encountered during inference without requiring ad-hoc augmentation ranges.


\textbf{Independent instance masks (\texttt{overlap\_mask\,=\,False}).}
By default, YOLO merges all overlapping instance masks into a single binary
canvas per image.
For crossing hairs, this collapses two physically distinct shafts into one
connected region, forcing the model to explain a single blob with two
separate predictions — an ambiguous target that degrades convergence on dense
crossing configurations.
Disabling mask merging preserves one independent binary mask per instance,
matching the ground truth our pipeline generates by construction.



\section{Experimental Results}

\paragraph{Dataset: }
We evaluate on 365 real trichoscopic images spanning multiple devices and
magnification levels (androgenetic alopecia, alopecia areata, lichen
planopilaris, and healthy controls).
No test image was used in any training step.
% added for camera ready
Per-hair instance polygons for all 365 test images were annotated using Label Studio by trained annotators; the 600 source images used for segment extraction (Sec.~\ref{sec:library}) were partially annotated under the same protocol.


\paragraph{Inference variants: }
Five configurations isolate each preprocessing component.

\textbf{Variant~A} Default evaluation setting of YOLO.


\textbf{Variant~B} The image $i$ is first resized using the scaling factor $\sigma_i = d_\text{ref}/d_i$ and then is resized again to the nearest dimension that is a multiple of 32 so that YOLO can perform inference. The final size may be either smaller or larger, depending on which multiple of 32 is closest to the dimensions obtained after applying the sigma scaling factor.

\textbf{Variant~C} Resize using $\sigma$ as above. Then the scaled image is divided into multiple 640 × 640 tiles using padding or cropping when necessary. YOLO performs inference on each tile independently. Finally, the detections from all tiles are mapped back to their corresponding locations and merged to reconstruct the results in the original image coordinate system.

We additionally evaluate two union ensembles: ($B \cup C$), which combines the normalized variants, and ($A \cup B \cup C$), which combines all three variants.

\paragraph{Metrics:} We report mean average precision at IoU\,$\geq$\,0.50 for bounding boxes (Box~mAP$_{50}$) )~\cite{lin2014coco}, and Jaccard similarity (mean per-image pixel IoU of predicted vs.\ GT mask unions).

\begin{table}[t]
\centering
\caption{Inference preprocessing comparison on the 365-image real test set, with
  95\,\% bootstrap CIs (50\,000 resamples).
   Box mAP$_{50}$ CIs use per-image AP$_{50}$ mean (approximation); Jaccard CIs are exact.}
\label{tab:inference}
\setlength{\tabcolsep}{3pt}
{\small
\begin{tabular}{lcc}
\toprule
Method & Box mAP$_{50} $  & Jaccard \\
\midrule

1.\ Default -- Variant A
  & 0.50\,{\scriptsize[0.49,\,0.54]}
 
  & 0.49\,{\scriptsize[0.48,\,0.52]} \\
2.\ Rescale only -- Variant B
  & 0.47\,{\scriptsize[0.45,\,0.50]}
  
  & 0.48\,{\scriptsize[0.47,\,0.51]} \\
3.\ Rescale\,+\,tile\,+\,stitch -- Variant C
  & 0.32\,{\scriptsize[0.30,\,0.40]}
  
  & 0.54\,{\scriptsize[0.52,\,0.56]} \\
4.\ B\,$\cup$\,C (norm.\ union)
  & 0.43\,{\scriptsize[0.42,\,0.47]}
  
  & 0.53\,{\scriptsize[0.51,\,0.55]} \\
5.\ A\,$\cup$\,B\,$\cup$\,C (full union)
  & \textbf{0.52}\,{\scriptsize[0.51,\,0.56]}
  
  & \textbf{0.54}\,{\scriptsize[0.52,\,0.56]} \\
\bottomrule
\end{tabular}
}
\par\smallskip
\end{table}



\begin{table}[t]
\centering
\caption{Comparison with the semantic segmentation baseline.}
\label{tab:comparison}
\setlength{\tabcolsep}{5pt}
\begin{tabular}{lcccc}
\toprule
Method & Jaccard & Count MAE & Count MAPE & $r$ \\
\midrule
Semantic seg.\ + postprocessing~\cite{deng2024crossfiber}  & 0.49 & 24.2 & 33.5\% & 0.87 \\
Instance seg.\ A$\cup$B$\cup$C (ours)  & \textbf{0.54} & \textbf{13.7} & \textbf{32.3\%} & \textbf{0.93} \\
\bottomrule
\end{tabular}
\end{table}
% \paragraph{Results analysis:}
Tables~\ref{tab:inference} and~\ref{tab:comparison} summarise results. Variant~A has the best single-method
Box~mAP$_{50}$ (0.50) because it produces clean individual detections, while
Variant~C — despite the highest Jaccard (0.54) — drops to 0.32 in box AP
because stitching merges seam-adjacent segments into blobs that cover the
correct pixels but span multiple GT instances.

Only A$\cup$B$\cup$C (row~5) improves on the single-variant settings in both metrics
simultaneously (box~mAP$_{50}$: 0.50$\to$0.52; Jaccard: 0.49$\to$0.54),
confirming genuine complementarity across the three variants.
Table~\ref{tab:comparison} shows that this configuration also outperforms the
semantic segmentation baseline on every shared metric, reducing count MAE by
43\,\% (24.2$\to$13.7\,hairs) and improving counting correlation from 0.87
to 0.93, demonstrating that explicit per-shaft instance prediction is
superior to postprocessing a semantic mask for hair counting.



\section{Discussion and Conclusion}
The results reveal three main sources of error: white and depigmented hairs are absent from the synthetic dataset, causing systematic false negatives; GT annotations include a thin background margin that penalizes tight predictions on Jaccard; and very short hairs appear in the test set but not in the synthetic corpus. Despite these limitations, the task remains highly challenging because scale-normalized shafts are only 4–10 px wide, making even a one-pixel offset significant. We introduced a scale-aware synthetic compositing pipeline for per-hair instance segmentation in trichoscopy, combining inpainted scalp backgrounds with a reusable hair library under follicle-diameter calibration. Experiments show that scale normalization and tiling improve pixel coverage, while only the full union approach recovers both localization and segmentation performance simultaneously. Future work should expand synthetic coverage to white and very short hairs and refine GT annotations to further reduce the remaining performance gap.




\bibliographystyle{splncs04}
\bibliography{cite}


\end{document}


