\section{Materials and Methods}
\label{sec:methods}


\subsection{Equivariant CNN}
\label{sec:methods:cnn}


To facilitate unsupervised equivariant feature learning~(Sec.~\ref{sec:methods:unsupervised}), we utilize an equivariant CNN as a feature extractor.
To achieve rotational equivariance, we use symmetric rotation equivariant (SRE) convolution (SRE-Conv) kernels~\cite{du2025_SRE}, which are centrally symmetric and efficiently parameterized to minimize redundancy. A proof of SRE-Conv's equivariance is provided in the Appendix (Sec.~\ref{sec:appendix:proof}). We construct a fully convolutional CNN (SRENet) by replacing all standard, non-equivariant convolution layers in a ResNet18~\cite{He2016-de} with SRE-Conv layers.
Specifically, we replace ResNet18's convolution layers with SRE-Conv layers using kernel sizes [9,9,5,5] at each of the network's four main stages, respectively.
An equivariant pooling layer followed by a 1$\times$1 convolutional layer with a stride of 1 was incorporated to ensure consistent positional convolution. 
The final classification layer after feature extraction uses a global adaptive pooling operation to ensure that the classifier maintains equivariance.
We pre-train SRENet using a supervised learning task to learn equivariant histopathologic imaging features.






\subsection{Unsupervised Equivariant Feature Learning}
\label{sec:methods:unsupervised}


Due to SRENet's equivariant design, we can extract equivariant imaging features from any layer of the network.
To extract features from SRENet, an input image is fed into the model and we extract the feature maps $\mathcal{F}_L$ from the $L$-th layer of the network.
The feature map $\mathcal{F}_L$ is scaled to (128,128). To avoid edge artifacts at the tissue-background boundary, we identified the non-background pixels by creating a tissue mask using intensity thresholding and morphological operations.  
This mask was used to filter feature maps for pixels corresponding to tissue.
We randomly sample $n$ feature embeddings from the valid mask region of $\mathcal{F}_L$. These $n$ features underwent unsupervised K-means clustering to identify K distinct clusters of features. 


\subsection{Unsupervised Segmentation}
\label{sec:methods:segmentation}

To segment a given test image, the image is fed into SRENet and we extract the $L$-th layer feature map $\mathcal{F}_L$.
This feature map is scaled to (128,128).
The feature embeddings at all masked pixel locations are then fed into the K-means clustering model. The predicted K-means labels are subsequently re-mapped to their original pixel locations within the image using the positional information provided by the mask and then upscaled to the input image size.
This process yields a cluster label image for each input image.




