
\section{Methodology} \label{sec:method}

\subsection{Cubical Multiparameter Persistence} \label{sec:MP}


\paragraph{From single to multiparameter persistence.}
In single-parameter cubical persistence, a grayscale (or single-channel) image \(\X\) induces a filtration \(\{\X_n\}_{n=1}^N\) indexed by thresholds \(\{t_n\}\), where \(\X_n\) is the binary image obtained by activating pixels with intensity below \(t_n\).
Persistent homology then tracks when connected components, holes, and higher-dimensional features appear and disappear as the threshold increases, and summarizes them with barcodes or persistence diagrams.

In multiparameter persistence (MP), we let the image evolve along two or more directions at once.
We focus on the two-parameter case.
A \emph{bifiltration} of an \(r\times s\) image \(\X\) is a family \(\{\X_{m,n}\}\) of binary images such that
\(\X_{m,n} \subset \X_{m+1,n}\) and \(\X_{m,n} \subset \X_{m,n+1}\).
Each row and each column is a standard 1D filtration, and together they form a grid of nested images.
Applying homology at each grid point gives a collection of topological features that now live over a 2D index set instead of a single line.

A key difference from the single-parameter case is that there is no unique way to assign a single birth and death time to each feature, since the indices \((m,n)\) are only partially ordered.
As a result, there is no canonical barcode or persistence diagram in general multiparameter settings~\cite{botnan2022introduction}.
Several alternative summaries have been proposed, such as rank invariants and MP landscapes~\cite{vipond2021multiparameter,loiseaux2023stable}, but most of them are still relatively heavy for practical large-scale imaging.





\paragraph{Betti tensors as multiparameter signatures.}
In this work we adopt a simple but effective summary based on Betti numbers over the grid.
For each grid point \((m,n)\) and each homological dimension \(k\), we define

\centerline{$\beta^k_{m,n} \;=\{\text{the count of k-dimensional topological features in } \X_{m,n}\}$}

Collecting these values over the grid yields a 2D \emph{Betti tensor}
\quad $\mathbf{B}_k(\X) = [\beta^k_{m,n}] \in \mathbb{N}^{M\times N}.$
Intuitively, \(\mathbf{B}_0(\X)\) records how the number of connected components changes across two parameters, and \(\mathbf{B}_1(\X)\) does the same for holes.
Unlike persistence diagrams, Betti tensors do not distinguish long-lived from short-lived features, but they provide a compact, grid-aligned representation that is easy to store and to feed into neural networks.
This type of Betti-based encoding has been empirically effective in several medical and histopathological imaging tasks~\cite{qaiser2019fast,yadav2023histopathological,du2022distilling}.
Additional details and a more formal connection to the multipersistence literature are given in Appendix~\ref{app:MP}.



\paragraph{Color multifiltrations for dermoscopic images.}
RGB dermoscopic images naturally support multiparameter filtrations~\cite{korkmaz2025cumperlay}.
Let \(\X\) be an RGB image with channel values \(R_{ij}, G_{ij}, B_{ij} \in [0,255]\) for each pixel (cubical cell) \(\Delta_{ij}\).
In general, choosing threshold sets
$\{s_m\}_{m=1}^{N_1},\quad \{t_n\}_{n=1}^{N_2},\quad \{v_r\}_{r=1}^{N_3}$
for the three channels defines a three-parameter multifiltration
$\X_{m,n,r} = \left\{\Delta_{ij} \subset \X \mid R_{ij} \le s_m,\; G_{ij} \le t_n,\; B_{ij} \le v_r \right\},$
and corresponding 3D Betti tensors \([\beta^k_{m,n,r}] \in \mathbb{N}^{N_1\times N_2\times N_3}\).
Figure~\ref{fig:bifiltration} shows a small toy example with a \(3\times 3\) grid for clarity.

\begin{wrapfigure}{r}{3.2in}
\vspace{-.2in}
    \centering
    \includegraphics[width=\linewidth]{figures/bifiltration.png}
    \vspace{-.15in}
\caption{\footnotesize \textbf{Toy example.}
For an image \(\X\) with two color channels, a simple color bifiltration produces a \(3\times 3\) grid of binary images (the actual grid we used is \(20\times 20\)). Horizontally, pixels are activated (colored orange) when their red value falls below the threshold, and vertically, activation depends on the blue value. Each row and column forms an ordinary one dimensional filtration, while the grid as a whole defines a two dimensional multiparameter filtration.}
    \label{fig:bifiltration}
    \vspace{-.4in}
\end{wrapfigure}
For our experiments we adopt a computationally efficient two-parameter specialization tailored to dermoscopic images.
We construct a bifiltration over the red and green channels (identified as most informative on a validation set) using \(M=N=20\) thresholds to form a \(20\times 20\) grid.
For each image we compute the corresponding \(\beta_0\) and \(\beta_1\) tensors together with activated-pixel counts, and stack them into a \(3\times 20\times 20\) \emph{topological image}.
This multipersistence representation is used both as input to XGBoost baselines and as the topological branch in the TopoCon-MP fusion model.
A more formal discussion of multiparameter persistence, barcode obstructions, and alternative summaries (including our Betti tensor view) is provided in Appendix~\ref{app:MP}.




\vspace{-.1in}




\subsection{Topology Aware Supervised Contrastive Learning} \label{sec:constrast}

Supervised contrastive learning encourages representations that cluster samples from the same class while separating those from different classes. Standard supervised contrastive methods typically construct multiple “views’’ of each image using random augmentations such as cropping, rotation, or intensity jitter. In dermoscopy, however, aggressive spatial augmentations can distort lesion boundaries or alter diagnostically relevant texture patterns.



\begin{figure}[t]
\centering
\includegraphics[width=.9\linewidth]{figures/flowchart2.png}
\caption{\footnotesize \textbf{Overview of TopoCon-MP.}
The raw dermoscopic image is processed by a pretrained Swin Transformer backbone to obtain semantic image features.
In parallel, we compute multiparameter Betti tensors on a fixed grid and stack \(\beta_0\), \(\beta_1\), and activated-pixel counts into a \(3\times 20\times 20\) topological image.
This topological image is encoded with an MLP and aligned with the Swin features via a topology-aware supervised contrastive loss.
The fused representation is passed to a final classifier for benign vs.\ malignant prediction.}
\label{fig:complete-flowchart}
\vspace{-.3in}
\end{figure}


We therefore propose a \emph{topology aware} supervised contrastive framework that uses the original dermoscopic image and its multiparameter topological embedding as two semantically consistent views of the same case (see Fig.~\ref{fig:complete-flowchart}). For each input image \(I\), we compute its cubical multiparameter persistence representation, capturing structural and morphological characteristics in a label preserving and anatomically coherent way. The resulting bifiltration produces a 2D topological image \(\Psi(I) \in \mathbb{R}^{H \times W \times 3}\), where the three channels correspond to \(\beta_0\), \(\beta_1\), and the activated pixel map derived from the multipersistence computation. This RGB style topological image emphasizes global topology and boundary structure without introducing augmentation induced bias, which is a common issue in medical imaging where strong contrastive augmentations (e.g., heavy color jitter, aggressive cropping, blur) can distort clinically meaningful cues and produce label-inconsistent views, encouraging invariances that are undesirable for diagnosis.



\begin{wrapfigure}{r}{3in}
\vspace{-.1in}
\centering
\includegraphics[width=\linewidth]{figures/topocon_mp2.png}
\vspace{-0.3in}
\caption{\footnotesize \textbf{Supervised contrastive framework.} A dermoscopic image and its multipersistence image (toy $3\times3\times3$ example) are encoded by separate networks. Their embeddings are used for classification and a supervised contrastive loss that aligns image and topology representations.}

\label{fig:topocon_mp}
\vspace{-0.25in}
\end{wrapfigure}
An image encoder \(f_\theta(\cdot)\) (a pretrained Swin Transformer backbone with a linear head) and a topology encoder \(g_\phi(\cdot)\) (an MLP on \(\Psi(I)\)) produce latent embeddings
 $z_I = f_\theta(I),  z_T = g_\phi(\Psi(I)).$
These embeddings are concatenated and fed to a classifier for the main lesion classification task. In parallel, they are mapped through projection heads and used in a supervised contrastive loss: samples that share the same class label, whether they come from the image branch or the topology branch, are treated as positives, and samples from different classes are treated as negatives (following the formulation of Khosla et al.).



The final training objective combines cross entropy and supervised contrastive losses,\quad 
$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{CE}} + \lambda \,\mathcal{L}_{\text{SupCon}},$\quad
where \(\lambda\) balances discriminative and alignment terms (See Fig.~\ref{fig:topocon_mp}). This objective aligns image and topology embeddings in a class consistent manner while preserving classification performance. By using topological representations as label preserving views instead of only random augmentations, TopoCon-MP provides contrastive supervision better tailored to limited data and anatomy sensitive medical imaging.






