The method proposed here is a two step process with input being the raw images and the final output being  per pixel classification labels as shown in Figure \ref{fig:re-segmentation}. The pipeline of our approach is shown in Algorithm \ref{alg:pseudocode}.

\begin{figure}
\centering
\begin{minipage}{0.74\textwidth}
\begin{algorithm2e}[H]
\caption{Tracking-Assisted Segmentation (TAS)}
\label{alg:pseudocode}
\KwIn{Sequence of frames}
\KwOut{Per frame cell segmentation and tracking information}
\For{frame in sequence} {%\vspace{0.5em}
	\textbf{Step 1}: Initial segmentation %\vspace{-0.3em}
	\begin{itemize}
		\setlength\itemsep{0em}
		\item Separate cell bodies from the background using U-Net %\vspace{-0.5em}
		\item Define cell identities using watershed or random walker %\vspace{-0.5em}
	\end{itemize}
	\textbf{Step 2}: Siamese tracking %\vspace{-0.8em}
	\begin{itemize}
		\setlength\itemsep{0em}
		\item Track each cell in small intervals %\vspace{-0.5em}
		\item Identify biological cell behaviours:%\vspace{-0.5em}
		    \begin{itemize}
		        \item \textbf{Collision}, if  \textit{match}(cell$_t$, cell$_{t-1}^{(1)}$, cell$_{t-1}^{(2)}$)%\vspace{-0.3em}
		        \item \textbf{Mitosis}, if  \textit{match}(cell$_t$, cell$_{t+1}^{(1)}$, cell$_{t+1}^{(2)}$) %\vspace{-0.3em}
		        \item \textbf{Cell death}, if  \textit{match}(cell$_t$, None) %\vspace{-0.3em}
		    \end{itemize}%\vspace{-0.5em}
		\item Re-segment cells using watershed %approach
	\end{itemize}
}
\end{algorithm2e}
\end{minipage}
\end{figure}


\subsection{Initial Segmentation} 
% Modified for R4 TAS differences
For the initial segmentation of cells, we propose three variants of our tracking-assisted segmentation (TAS) method, namely general, intermediate and specialised TAS. Here and henceforth, these approaches will be referred as g-TAS, i-TAS and s-TAS, respectively. Further, the term \emph{pre-processing} refers to thresholding and filtering the raw image, e.g., to account for varying illumination, cell size, morphology and so on. With \emph{post-processing}, we refer to additional adjustments after the U-Net output, such as filling prediction gaps, threshold smaller regions and fitting ellipses. We follow \citet{lux2019dic} for the pre- and post-processing steps, either per dataset -as originally in their work- or across datasets. The three TAS variants differ in the pre- and post-processing steps and how these steps involve parameter tuning to a specific dataset. 
\begin{enumerate}
    \item g-TAS performs no pre-processing and no post-processing that relates to the U-Net approach or any specific cell-type.
    \item i-TAS tunes the pre-processing hyperparameters (associated with image normalisation, histogram equalisation and uneven illumination correction) independently per dataset (on the respective training sets) as explained in the work of \citet{lux2019dic}. This is to account for the very different range and distribution of pixel values that adversely affect the U-Net predictions. The siamese-based re-segmentation step after the U-Net output (our main contribution), remains the same for all data sets. 
    \item s-TAS adapts the pre- and post-processing hyperparameters per dataset, using the approach proposed in \citet{lux2019dic}. This variant is designed specifically for detecting precise cell boundaries in order to minimise erroneous cell detection and be competitive with optimised state-of-the-art approaches. s-TAS retains the same re-segmentation augmented approach which we introduce. 
\end{enumerate}
\subsubsection{General Tracking Assisted Segmentation (g-TAS)}
G-TAS is a generalised implementation of tracking-assisted segmentation approach that requires no tuning or adjustments with respect to any datasets. First, a U-Net model \cite{unet} is explicitly trained across all data from all the different cell types, \ref{subsec:data}. Next, the random walker algorithm from the work of \citet{randomwalks} is used to generate the initial cell segmentation of the pipeline. Recent methods use the watershed algorithm for splitting cell body predictions from neural networks into smaller cells \cite{sharif2012red, lux2019dic}. However, the random walker algorithm is shown to be more resilient to noise and hence more suitable for predictions of diverse image properties and cell morphologies \cite{randomwalks}. The output of the random walker is then fed into the tracker network to correct for mitosis, apoptosis and cell collisions, \ref{subsubsec:mitosis}. 

Training of the U-Net network was done using data augmentation with additive noise, pixel value range shift and a cutoff on the maximum value. The choice of these specific augmentations arises from the observation of the dataset images which are not consistent in a standard RGB value range and have low signal-to-noise ratio as indicated by \citet{cellTrackingChallenge}. The network is trained with a learning rate of $0.001$ using the Adam optimiser for 50 epochs. Normalisation of the raw images includes (i) normalisation to zero mean and unit variance, (ii) construction of cell centroids from the U-Net predictions based on the distance to the background (where 0 indicates background and 1 the farthest foreground), and (iii) distinct cells are defined by running the random walker algorithm. All parameters in g-TAS are the same across all datasets. More details related to the experimental implementation are presented in Appendix \ref{sec:appendixC}. Unlike the methods that use thresholds based on morphology \cite{lux2019dic}, the pre-processing of the raw images and post-segmentation division of small cells are also generalised, and not specific to any dataset.

\subsubsection{Intermediate Tracking Assisted Segmentation (i-TAS)}
This approach relaxes the constraint of a general pre-processing method and uses the U-Net implementation of \citet{unet}, pre-trained as in the work of \citet{lux2019dic}. The same pre-processing steps are employed, and the model is manually fine-tuned to the image properties based on the type of cell image. To evaluate the effect of morphological boundary refinements on cell segmentation, the approach of defining unique cells from the initial segmentation, using the random walker algorithm, is kept the same as in g-TAS. 


\subsubsection{Specialised Tracking Assisted Segmentation (s-TAS)}
The third TAS variant relaxes further the constraint of dataset-tuned segmentation, training the U-Net in each dataset separately, and applies the water deconvolution method as described in the work of \citet{lux2019dic} instead of the random walker algorithm. Compared to i-TAS, the difference is that the post-processing implementation of our model also now uses several tuning parameters which are typically adjusted for every dataset. Thus, this approach can be interpreted as a fine-tuned implementation of TAS that specialises for a certain dataset. 

The output of the initial segmentation (g-TAS, i-TAS or s-TAS without the siamese re-segmentation) produces finely segmented cells where the actual cell behaviours might not be correctly expressed. A schematic representation related to this issue is shown in Figure \ref{fig:collision}. To be able to correctly reason about cell movement over time, cells need to be correctly tracked as well.

\subsection{Siamese tracking} 
The location of the cell in subsequent frames is linked using siamese tracking. For tracking purposes, a SiamFC tracker \cite{siamfc}, pre-trained on the GOT-10k dataset \cite{huang2019got}, is used. Tracking is done in the forward as well as in the backward direction to predict the new location of any cell in the previous and next frames based on its position in the current frame. The cell segmentation is refined through tracking by detecting the occurrences of mitosis and collision events. For the two events, tracking is performed in opposite directions along the temporal dimension. 

The working of the tracking module is as follows. Let $I_t$ denote the $t^{\text{th}}$ frame in a sequence of length $T$, and $\mathcal{S}_t = \{ C_t^1,...,C_t^K\}$ be the set of detected cells in this frame. These are used to initialise the tracker at step $t$. For cell $C^i_t$, the predicted locations by the tracker in $I_{t+1}$ and $I_{t-1}$ are referred as forward $(F_t^i)$ and backward $(B^i_t)$ predictions, respectively. Collision and mitosis are then detected, descriptions of which follow below. Note that the movement prediction model explained here does not depend on the morphology of the cell or on any other image property. Hence, it is directly applicable on top of any segmentation algorithm without the need for additional tuning.

% Modified for R4 step size
It is important to note that the tracker uses a step size of 1, either in the positive or negative temporal dimension. We have experimented with other step sizes but found that mitotic events are not consistent in step size, and from our parameter search, step size of 1 seemed appropriate. However, cells may collide with each other and often stay close to other cells, having adjacent boundaries for several frames, before they split up again. When cells coincide with one another for more than one frame then the re-segmentation will split them at the first possible frame where it detects a collision or mitosis. Thereafter, from that frame on wards, the algorithm will track both of them; re-segmenting them again if they are collided in subsequent frames. In the event of a false negative, e.g. a mitotic event that is not detected, the algorithm will assume the cell to be one bigger cell. In the next frame, if the mitotic event becomes more clear, i.e. the algorithm detects two independent cells, then the algorithm will re-segment the bigger cells into two, one frame later.


\subsubsection{Collision detection}
Collision occurs when two cells share a fraction of their boundary, and this can be mistaken as a single cell during segmentation. When processing a new frame $I_t$, where $t > 1$, collision detection is performed first in which a cell $C_{t}^i$ is considered to be a lump of multiple individual cells if the centroids of two or more cells in $\mathcal{S}_{t-1}$ lie within the tracked region $B_{t}^i$. If this is the case, $C_{t}^i$ is re-segmented using the centroids of the two cells in the previous frame $I_{t-1}$. This collision detection procedure continues until each cell in $I_t$ matches at most one cell in $I_{t-1}$.

\subsubsection{Mitosis detection} 
\label{subsubsec:mitosis}
For detecting mitotic events, a procedure similar to that of detecting collisions is performed on a sequence of frames, but in the reverse direction. Cells are matched in $\mathcal{S}_{t-1}$ to the detected cells in $I_{t}$. Namely a cell $C_{t-1}^i$ is matched to a cell $C_{t}^i$ if the centroid of $C_{t}^i$ is inside the region $F_{t-1}^i$. Different from collision detection, however, $C_{t-1}^i$ is also matched to $C_{t}^i$ if the centroid of the region $F_{t-1}^i$ lies within the boundaries of the cell $C_{t}^i$. This matching procedure yields a set of matches for each cell $C_{t-1}^i$, which we denote as $M_{t-1}^i$ and its size as $|M_{t-1}^i|$. 
% Modified for R4 mitotis and apoptosis
$|M^i_{t-1}|$ indicates the number of cells that are associated (matched) in the subsequent frame, with cell $C^t_{t-1}$ in the previous frame. If the number of matched cells is zero ($M^i_{t-1} = 0$) then the cell has died; if the number of matched cells is one ($M^i_{t-1} = 1$) then there is only one cell in the next frame associated with the cell in the previous frame; if $M^i_{t-1} >1$, then there exists a mitotic event. The state of cell $C_{t-1}^i$ is then determined according to Equation \eqref{eq:state}.

\begin{equation}
	\begin{split}
		C_{t-1}^i &= 
		\begin{cases}
		\text{Apoptosis}, &  |M_{t-1}^i| = 0\\
		M_{t-1,1}^i,   &  |M_{t-1}^i| = 1\\
		\text{Mitosis}, & \text{otherwise}
		\end{cases}
	\end{split}
	\label{eq:state}
\end{equation}
where apoptosis is the cell death, or the disappearance of the cell from the field of view, as described by \citet{ulman2017objective}.

In case of mitosis, the cell splits, thus the tracking of $C_{t-1}^i$ ends and the cells in $|M_{t-1}^i|$ are initialised with two new trackers which have $C_{t-1}^i$ as their parent. The new cells in $\mathcal{S}_{t}$ that are not linked to any cell in $I_{t-1}$ are interpreted as newly detected cells which start their ``life" in $I_{t}$ without link to a parent cell.

\subsubsection{Re-segmentation} 
In case of a detected collision of two or more cells into a cell $C_{t}^i$, cell $C_{t}^i$ is re-segmented in such a manner that the new number of segments matches the number of colliding cells. This is achieved using watershed deconvolution as described in the work of \citet{kachouie2008watershed}. To prevent over-segmentation of the cell $C_{t}^i$, which adversely affects segmentation accuracy, the relative position of the centroids of the cells in $I_{t-1}$ are used as the seeds for the segmentation algorithm. An illustration showing re-segmentation of cells is shown in Figure \ref{fig:collision}.

The above algorithm is designed as a simple and general method to track biological cells of different size, shape and movement patterns. Volatile trajectories and unpredictability of cell location are dealt with using one approach for all datasets, with the re-segmentation being invariant to cell morphology and image properties. 
% The re-segmentation refinement through tracking is performed without the need for manually picking parameters that work well for each specific dataset but is a uniform method invariant to cell morphology and image properties.
