\centerline{\textbf{\Large Appendix}}

\section{Multiparameter Persistence} \label{app:MP}

Multiparameter persistence has attracted growing interest for its potential to enrich standard persistent homology. In principle, a multidimensional filtration with several parameters should yield richer topological summaries for machine learning than a one-parameter filtration. However, key technical obstacles have limited its practical impact.

In single-parameter persistence, the threshold space $\{\alpha_i\}$ is totally ordered, so each topological feature in the filtration $\{\Delta_i\}$ has well defined birth and death times. This makes it possible to decompose the associated persistence module $M = \{H_k(\Delta_i)\}_{i=1}^{N}$ into a multiset of intervals (barcodes) via a structure theorem~\cite{botnan2022introduction}, which underlies persistence diagrams. For two or more parameters, the threshold set $\{(\alpha_i,\beta_j)\}$ is only partially ordered. Birth and death times are no longer uniquely defined, the one dimensional decomposition theorem does not extend~\cite{botnan2022introduction}, and barcode representations typically fail to exist or are difficult to describe in a finite way. As a result, a direct barcode style generalization of single-parameter persistence is usually not available, and the classification and invariants of multiparameter modules remain an active area of research in commutative algebra~\cite{eisenbud2013commutative}.

Despite these challenges, several slicing based methods have been proposed to make use of multiparameter filtrations~\cite{lesnick2015theory,carriere2020multiparameter,botnan2022introduction}. These approaches analyze one dimensional slices of the multiparameter grid, compute standard persistence diagrams along each slice, and then aggregate the resulting diagrams into vectorized summaries. While effective in some settings, they face two main limitations: the summaries can depend strongly on the choice of slicing directions, and compressing information from many diagrams into a low dimensional representation may introduce substantial information loss; see \cite{botnan2022introduction} for a detailed overview.

In this work we adopt a different strategy that avoids slicing. We work directly with the Betti numbers on the grid: for each grid point and each homological dimension $k$, we record the rank of $H_k$ and collect these ranks into Betti tensors. This can be viewed as evaluating the Hilbert function of the underlying multiparameter module on a fixed finite grid. The resulting Betti tensors provide a simple, grid aligned summary that is easy to compute and to feed into neural networks, and they have been empirically effective in several imaging applications~\cite{qaiser2019fast,yadav2023histopathological,du2022distilling,ali2023survey}.

\begin{figure}[h!]
    \centering
    \includegraphics[width=\linewidth]{figures/erosion2.pdf}
    \caption{\footnotesize \textbf{Erosion filtration.} For a given binary image $\X$, we first define the erosion function (shown in $\X_0$). We then obtain a filtration of binary images $\X_0\subset\X_1\subset \dots \subset\X_3$ by activating pixels that reach the threshold value.}
    \label{fig:erosion}
\end{figure}

\paragraph{Other bifiltration examples.}
A well known limitation of grayscale sublevel filtrations is their inability to encode the \emph{size} of topological features; they only reflect differences in function values between the birth and death of a feature. For example, consider a grayscale image where all pixels have intensity 0 except for a single central pixel with intensity 255. The resulting persistence diagram contains a single long bar $[0,255)$, even though the corresponding hole has diameter 1. Conversely, a binary image $\X_{100}$ might contain a large hole of diameter 20 whose pixels have intensities in $[101,105]$, so the hole is completely filled by $\X_{105}$. Despite the dramatic change in geometric size, the grayscale sublevel filtration produces only a short bar $(100,105)$, encoding the contrast but not the spatial scale of the hole.

In other words, while persistent homology identifies which topological features appear in a filtration, standard sublevel filtrations do not, by themselves, capture their geometric size. To address this, alternative filtrations such as \emph{erosion, dilation}, and \emph{signed-distance based filtrations} have been proposed~\cite{garin2019topological}. These constructions explicitly incorporate scale information and thus complement grayscale sublevel filtrations. In particular, one can combine a grayscale or color channel with an erosion or distance based filtration to obtain meaningful multiparameter persistence signatures for images.

\paragraph{Why Betti tensors for multipersistence.}
Multiparameter persistence offers a rich structural summary, but in practice \emph{vectorization is a primary bottleneck}: many MP representations that retain lifetime information (e.g., signed-barcode measures) can be expensive, sensitive to design choices, and difficult to integrate with standard deep pipelines at scale. We therefore adopt Betti tensors as a deliberate tradeoff between expressivity and learnability. Betti tensors preserve the intrinsic 2D bifiltration geometry as an \emph{image-like} object, enabling simple encoders and contrastive fusion to exploit local and global patterns in the $(r,g)$ filtration grid, whereas more expressive MP vectorizations often collapse this grid structure into an unordered feature set. This choice is also motivated by the data regime: methods with learnable filtrations or heavy MP representation learning (e.g., CuMPerLay-style models~\cite{carriere2020perslay,korkmaz2025cumperlay}) typically benefit from substantially larger datasets and careful tuning, while our goal is a lightweight, stable MP descriptor that works reliably with limited medical data. Finally, as illustrated by the Betti curve visualizations (Fig.~7), the discriminative signal appears distributed across many small topological events (curve density over thresholds) rather than a few dominant features, which is naturally captured by Betti curves and their tensorized MP extension.


\section{Sensitivity Analysis} \label{sec:ablation2}

We provide two tables provide a small sensitivity analysis of our topological descriptors with respect to filtration discretization and channel selection. Table~\ref{tab:threshold_sensitivity_milk10k} varies the number of intensity thresholds used to discretize single parameter persistence on MILK 10K while keeping the downstream classifier fixed. Increasing the resolution from 50 to 100 to 250 thresholds changes the feature dimensionality from 600 to 1200 to 3000, yet the performance remains stable, with only modest fluctuations in AUC, accuracy, F1, sensitivity, and specificity. This indicates that the extracted TDA signal is not overly dependent on a particular threshold granularity once a reasonable resolution is used. Table~\ref{tab:sp_channel_sensitivity} reports single channel single parameter persistence features computed from the red, green, or blue channel (same feature budget) across DermaMNIST, MILK 10K, and PAD. The relative ordering across channels differs by dataset, but overall performance is comparable, suggesting that no single channel is universally dominant and motivating our use of complementary channels in the multipersistence construction. Together, these results support that the proposed topological features are reasonably robust to practical choices in filtration discretization and channel selection.


\begin{table}[t]
\centering
\footnotesize
\caption{Sensitivity to threshold resolution for single-parameter persistence features on MILK-10K (all channels).}
\label{tab:threshold_sensitivity_milk10k}
\begin{tabular}{l c c c c c c c}
\toprule
Dataset & \#thresholds & \#features (all channels) & AUC & Acc & F1 & Sens & Spec \\
\midrule
MILK-10K & 50  & $50 \times 3 \times 4 = 600$   & 74.9 & 59.5 & 19.6 & 19.8 & 93.8 \\
MILK-10K & 100 & $100 \times 3 \times 4 = 1200$ & 74.8 & 60.1 & 20.4 & 20.4 & 94.0 \\
MILK-10K & 250 & $250 \times 3 \times 4 = 3000$ & 74.5 & 60.4 & 21.5 & 21.2 & 94.0 \\
\bottomrule
\end{tabular}
\end{table}


\begin{table*}[t]
\centering
\small
\caption{Single-parameter persistence (SP) features using a single color channel (50 thresholds; 150 features).}
\label{tab:sp_channel_sensitivity}
\resizebox{\textwidth}{!}{
\begin{tabular}{l c ccccc| ccccc | ccccc}
\toprule
\multirow{2}{*}{TDA model} & \multirow{2}{*}{\#Features} 
& \multicolumn{5}{c}{DermaMNIST} 
& \multicolumn{5}{c}{MILK-10K} 
& \multicolumn{5}{c}{PAD} \\
\cmidrule(lr){3-7}\cmidrule(lr){8-12}\cmidrule(lr){13-17}
 &  & AUC & Acc & F1 & Sens & Spec
    & AUC & Acc & F1 & Sens & Spec
    & AUC & Acc & F1 & Sens & Spec \\
\midrule
SP\_Red   & 150 & 83.4 & 70.0 & 28.2 & 25.6 & 88.9 & 71.5 & 57.4 & 17.3 & 18.0 & 93.4 & 71.1 & 44.7 & 23.5 & 25.0 & 86.2 \\
SP\_Green & 150 & 85.4 & 71.2 & 31.6 & 28.5 & 89.9 & 75.6 & 57.6 & 17.9 & 18.3 & 93.4 & 72.3 & 47.8 & 27.7 & 29.0 & 87.0 \\
SP\_Blue  & 150 & 87.7 & 71.0 & 35.6 & 31.6 & 90.5 & 72.3 & 57.6 & 17.6 & 18.3 & 93.6 & 73.7 & 51.3 & 31.4 & 32.2 & 87.8 \\
\bottomrule
\end{tabular}}
\end{table*}


\section{Visualizations and Interpretability of Topological Descriptors}
\label{app:visualization}

To better understand what information our multipersistence descriptors capture, we
visualize classwise Betti curves and median multipersistence heatmaps for the MILK-10K
dataset.


\paragraph{Betti curves across color channels.}
Tables~\ref{fig:betti}, \ref{fig:betti2}, \ref{fig:betti3} show the mean Betti curves with 40\% confidence
bands for five lesion classes (BCC, NV, BKL, MEL, AKIEC) across red, green, blue, and
grayscale filtrations. The top row plots $\beta_0$ (number of connected components)
as a function of the intensity threshold and the bottom row plots $\beta_1$ (number
of holes). The solid (top) and dashed (bottom) lines denote the classwise means, while
the shaded regions mark the central 40\% of subjects for each class.

Several consistent patterns emerge. First, BKL and AKIEC lesions tend to exhibit higher
and broader $\beta_0$ and $\beta_1$ peaks, reflecting a larger number of small
islands and holes across intermediate thresholds. This is compatible with their
irregular, mottled pigmentation patterns in dermoscopy. In contrast, NV lesions show
lower-amplitude curves and narrower peaks, consistent with more homogeneous, compact
nevi. MEL curves typically peak at slightly darker thresholds than NV, suggesting
richer structure in darker pigment regions, which is in line with the irregular
networks and focal globules often seen in melanoma. Across channels, the red and
grayscale filtrations yield the strongest separation between classes, which motivated
our choice of red–green bifiltrations for the main multipersistence pipeline.



\begin{figure}[t]
\centering
\includegraphics[width=\linewidth]{figures/betti_curves_Milk10k.png}
\caption{\footnotesize 
\textbf{Betti curves for MILK-10K.}
Mean Betti curves with 40\% confidence bands for five MILK-10K lesion classes
(BCC, NV, BKL, MEL, AKIEC) across color channels.
Columns correspond to red, green, blue, and grayscale intensity filtrations.
The top row shows $\beta_0$ (number of connected components) and the bottom row
shows $\beta_1$ (number of holes) as functions of the threshold value.
Solid (top) and dashed (bottom) lines indicate classwise means, and shaded
regions mark the central 40\% of subjects per class, revealing systematic
differences in multiscale topology between lesion types.
}\label{fig:betti}
\end{figure}
\begin{figure}[t]
\centering
\includegraphics[width=\linewidth]{figures/betti_curves_DermaMNIST.png}
\caption{\footnotesize 
\textbf{Betti curves for DermaMNIST.}
Mean Betti curves with 40\% confidence bands for five MILK-10K lesion classes
(BCC, NV, BKL, MEL, AKIEC) across color channels.
Columns correspond to red, green, blue, and grayscale intensity filtrations.
The top row shows $\beta_0$ (number of connected components) and the bottom row
shows $\beta_1$ (number of holes) as functions of the threshold value.
Solid (top) and dashed (bottom) lines indicate classwise means, and shaded
regions mark the central 40\% of subjects per class, revealing systematic
differences in multiscale topology between lesion types.} 
\label{fig:betti2}
\end{figure}


\begin{figure}[t]
\centering
\includegraphics[width=\linewidth]{figures/betti_curves_PAD.png}
\caption{\footnotesize 
\textbf{Betti curves for PAD-UFES.}
Mean Betti curves with 40\% confidence bands for five MILK-10K lesion classes
(BCC, NV, BKL, MEL, AKIEC) across color channels.
Columns correspond to red, green, blue, and grayscale intensity filtrations.
The top row shows $\beta_0$ (number of connected components) and the bottom row
shows $\beta_1$ (number of holes) as functions of the threshold value.
Solid (top) and dashed (bottom) lines indicate classwise means, and shaded
regions mark the central 40\% of subjects per class, revealing systematic
differences in multiscale topology between lesion types.}
\label{fig:betti3}
\end{figure}

\paragraph{Multipersistence heatmaps.}
While Betti curves summarize topology along a single intensity axis, our model uses
bifiltrations over red and green channels. Figure~\ref{fig:MP-heatmaps-milk} displays
classwise median multipersistence heatmaps on the $20\times 20$ red–green grid. Each
row corresponds to one lesion class, and the three columns show $\beta_0$, $\beta_1$,
and activated-pixel counts, respectively. Color encodes the median value at each grid
point, so bright regions indicate parameter ranges where many connected components,
holes, or pixels are present.

These heatmaps reveal complementary structure that is not visible from single-channel
curves alone. For example, NV lesions concentrate most of their $\beta_0$ and
$\beta_1$ mass in a relatively compact block of lighter thresholds, indicating
uniform pigmentation with limited cross-channel variation. BKL and AKIEC classes show
broader, more diffuse high-intensity regions that extend toward darker red and green
levels, consistent with heterogeneous pigmentation and scattered foci. MEL lesions
exhibit a shift of the $\beta_1$ hotspot toward darker-red / mid-green thresholds,
suggesting complex hole patterns in specific color combinations that align with
irregular pigment networks and streaks. The activated-pixel maps further highlight
differences in overall lesion occupancy in the red–green plane, which our model uses
jointly with $\beta_0$ and $\beta_1$.

Overall, these visualizations support the view that multipersistence encodes
class-specific, multiscale topology that is coherent with known dermoscopic
morphologies. While we do not claim these descriptors are directly diagnostic on their
own, they offer an interpretable intermediate representation: the regions of the
red–green grid where our Betti tensors are most active correspond to characteristic
patterns of lesion fragmentation and hole formation that our TopoCon-MP model can
exploit during training.

\paragraph{t-SNE Visualization of Learned Embeddings}
\label{sec:tsne}

We use t-distributed Stochastic Neighbor Embedding (t-SNE) to qualitatively examine the structure of learned feature representations on the DermaMNIST dataset. This visualization aims to provide intuition about class-wise clustering behavior in the embedding space.

We compare embeddings extracted from a frozen ImageNet-pretrained Swin Transformer (Tiny) with embeddings produced by a topology-augmented model that fuses Swin-T image features with multi-persistence (MP) topological descriptors. 

\begin{figure}[H]
    \centering
    \includegraphics[width=0.48\textwidth]{figures/tsne_vanilla_swin_test.png}
    \hfill
    \includegraphics[width=0.48\textwidth]{figures/tsne_topoconmp_fused_frozen_test.png}
    \caption{t-SNE visualization of DermaMNIST embeddings.
    \textbf{Left:} embeddings from a frozen Swin-T backbone.
    \textbf{Right:} topology-augmented embeddings obtained via TopoCon-MP.
    Colors indicate the seven diagnostic classes.}
    \label{fig:tsne_dermamnist}
\end{figure}

The topology-augmented embeddings exhibit more compact intra-class clusters and clearer separation between several diagnostic categories compared to the baseline Swin-T representation. While t-SNE provides only a qualitative view of the embedding space, the observed clustering patterns suggest that incorporating multi-persistence topological information introduces complementary structural cues that enhance feature discrimination.




\begin{figure}[t]
\centering
\includegraphics[width=.87\linewidth]{figures/median_heatmaps_Milk10k.png}
\caption{\footnotesize
\textbf{Multipersistence heatmaps for MILK-10K.}
Classwise median multiparameter descriptors computed on the red–green bifiltration grid
($20\times 20$ thresholds). Rows correspond to lesion classes
(BCC, NV, BKL, MEL, AKIEC) and columns show $\beta_0$, $\beta_1$, and activated-pixel
counts, respectively. Color encodes the median value at each grid point, highlighting
distinct patterns of connected components, holes, and overall lesion occupancy across
classes in the multipersistence representation.
}
\label{fig:MP-heatmaps-milk}
\end{figure}

\begin{figure}[t]
\centering
\includegraphics[width=.87\linewidth]{figures/median_heatmaps_DermaMNIST.png}
\caption{\footnotesize
\textbf{Multipersistence heatmaps for DERMAMNIST.}
Classwise median multiparameter descriptors computed on the red–green bifiltration grid
($20\times 20$ thresholds). Rows correspond to lesion classes
(BCC, NV, BKL, MEL, AKIEC) and columns show $\beta_0$, $\beta_1$, and activated-pixel
counts, respectively. Color encodes the median value at each grid point, highlighting
distinct patterns of connected components, holes, and overall lesion occupancy across
classes in the multipersistence representation.}

\label{fig:MP-heatmaps-derma}
\end{figure}

\begin{figure}[t]
\centering
\includegraphics[width=.87\linewidth]{figures/median_heatmaps_PAD.png}
\caption{\footnotesize
\textbf{Multipersistence heatmaps for PAD-UFES.}
Classwise median multiparameter descriptors computed on the red–green bifiltration grid
($20\times 20$ thresholds). Rows correspond to lesion classes
(BCC, NV, BKL, MEL, AKIEC) and columns show $\beta_0$, $\beta_1$, and activated-pixel
counts, respectively. Color encodes the median value at each grid point, highlighting
distinct patterns of connected components, holes, and overall lesion occupancy across
classes in the multipersistence representation.}

\label{fig:MP-heatmaps-pad}
\end{figure}

