\section{Background}
\label{sec:background}


\subsection{Equivariance in Computer Vision}
\label{sec:background:equivariance}


Equivariance under geometric transformation refers to the property of a function or a model where applying a transformation, e.g. rotation or reflection, to an input and then applying the function yields the same result as first applying the function to the input and then applying the transformation to the function's output. 
The mathematical definition of equivariance is $T(F[f(x, y)]) = F[T(f(x, y))]$, where $f(x,y)$ is an input signal in 2D space, $T$ is a transformation, e.g. rotation or reflection, and $F$ is a function.
The ability to capture equivariant features is important for feature extractors in image processing and pattern recognition, especially for image domains that lack explicitly meaningful orientation, e.g. digital histopathologic images.
A robust feature extractor should generalize both invariant and intrinsic information from the data, allowing it to handle variations among different inputs and extract high-level features. 
Such invariance commonly includes rigid transforms like simple two-dimensional translation, rotation, and reflection transforms. 
Translational and scaling invariance is more common in natural images that typically have explicit orientation, e.g. a horizon line, and certain radiology images that contain explicit anatomical orientations.
In contrast, rotational and reflection equivariance, the ability to maintain consistent output regardless of the angle of input data rotation or reflection, garners attention in histopathology image analysis since this data has no explicit orientation.


Deep learning has greatly benefited from the convolutional neural network (CNN). 
The translation equivariant characteristic of the CNN enables it to capture similar features in arbitrary input positions while maintaining consistent output. 
However, CNNs are not equivariant to rotation and reflection transforms.
% (\cref{fig:introduction}). 
Even a slight rotation or reflection of the image can result in a dramatic degradation of performance in biomedical image classification~\cite{du2025_SRE}. 
For example, a CNN can correctly classify a H\&E image as a ``tumor'', but when rotated by 90 degrees (or even reflected left-right), it incorrectly classifies the same image as ``benign''.
A common practice to address such a problem is to use geometric data augmentation to rotate and flip images during training, which effectively increases the number of training samples.
The extremely large number of degrees of freedom (DoFs) (trainable parameters) in modern CNNs allows these networks to effectively compensate for geometric changes in data by accommodating vast numbers of training samples, but the underlying features learned by the network are inherently different despite the image being of the same object.
Therefore, there is a need to learn equivariant imaging features that consistently and robustly represent the same images.
The equivariant feature learning approach proposed in this project addresses a significant limitation of CNN methods.



\subsection{Equivariant Feature Learning Approaches}
\label{sec:background:related_work}

A variety of approaches have been proposed to achieve equivariance in CNNs.
\textit{Orientation-aware neural networks} learn orientation information actively during training from the data and use the learned information to re-align the images to their standardized orientation~\cite{Jaderberg2015-zc} or learn this information by aligning all image gradients to a similar orientation~\cite{Hao2022-xt}. 
The rotation equivariant vector field network~\cite{Marcos2016-rm} uses filters of various orientations to generate output in the form of vector fields.
These approaches introduce extra learnable parameters to the model, which can lead to potential over-fitting. 
Such methods also tend to fail to align the input when the input has no specific orientation, e.g. histopathological images.
\textit{Rotation-encoded neural networks} encode pre-defined rotation transformations using circular harmonics~\cite{Worrall2017-mv}, steerable filters~\cite{Cesa2021-tu, Weiler2019-yf, Weiler2018-mb}, group-equivalent operations~\cite{Cohen2016-st}, or actively rotate the filters during convolution ~\cite{Zhou2017-sh}.
Similar attempts have been made to rotate the filters to gain a rotation-invariant property~\cite{Chidester2019-qy, Linmans2018-dh} or rotate the feature map~\cite{Follmann2018-oo} obtained by the rotated convolutional filter to embed the feature in four different orientations. 
Alternatively, some researchers process inputs in various orientations simultaneously to make networks aware of orientation relationships~\cite{Cabrera-Vives2017-iz, Gupta2020-xr, Yao2022-ku, Zhou2022-tx}. 
These methods are theoretically equivalent to the methods that rotate the filters. 
However, these methods bring excess size and computational cost to the network as the number of pre-defined angles increases. 
Meanwhile, these methods show weak performance for the angles that are not pre-defined.
\textit{Rotation-equivariant coordinate systems} ensure rotational equivariance by transforming the input data to a different coordinate system, e.g. cyclic coordinate systems~\cite{Mo2024-ks} or polar or log-polar coordinate systems~\cite{Esteves2017-ox, Kim2020-an, Paletta2022-jk}.
These methods benefit from the property that translation on the polar coordinate system is equivalent to rotation in the Cartesian coordinate system. 
However, the polar mapping will naturally result in the loss of the phase information and the image will also be distorted. 
\textit{Weight symmetric convolution} methods explicitly encode the convolution kernel weights to have symmetric properties~\cite{Yeh2016-uz} such as horizontal reflection~\cite{Dzhezyan2019-sk} or rotational symmetry~\cite{Dudar2019-wa,Fuhl2021-qo} for equivariance.
However, the performance of these methods was limited due to small kernel sizes, e.g. 3$\times$3, that hindered the model's ability to learn expressive features. 
The equivariant feature learning approach used in this paper leverages this rotationally symmetric kernel design strategies but addresses the aforementioned limitations~\cite{du2025_SRE,zhang2025_SREUnet}.
To our knowledge, equivariant feature learning strategies have not been applied to histopathologic image analysis tasks.