\section{Introduction and Background}
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Instance segmentation of structures in microscopy images is essential for multiple purposes. In recent years, many Deep Learning (DL) based approaches to microscopy image segmentation have been proposed~\cite{moen2019, caicedo2019_v2,schubert2018}.
Such methods can be divided into \textit{Top-down} and \textit{bottom-up} methods. 
Mask R-CNN~\cite{he2017}, for example, is arguably the most prominent top-down method, designed to detect object instances via bounding-boxes. An additional refinement step produces a pixel-mask from multiple predicted bounding-box detections.
Bottom-up methods, in contrast, are designed such that each pixel makes a prediction of either the object class it belongs to~\cite{ronneberger2015}, and/or the shape of the object instance it is part of~\cite{schmidt2018,neven2019,hirsch2020}. In a second phase, all methods need to consolidate their detections/predictions in order to obtain the final set of object instances.
Mask R-CNN~\cite{he2017} or StarDist~\cite{schmidt2018}, for example, avoid multiple detections of the same object by employing non-maximum suppression on an instance associated confidence score.
While DL-based methods helped to improve microscopy image data segmentation considerably, automated results are still subject to many errors that need to be addressed with manual post-processing. 

An additional complication comes from differences between the domain of natural and microscopic images. 
While objects in natural images are typically either vertically or horizontally aligned, objects in microscopy typically have complex and unique shapes and are randomly oriented. 
Hence, methods that employ axis-aligned bounding boxes, such as Mask R-CNN, tends to perform rather poorly. 
StarDist improves this shortcoming by assuming star-convexity of objects to be segmented.
While being the key to success for some datasets, this assumption backfires when morphologically more complex shapes need to be segmented. 

Another shortcoming of today's segmentation landscape is that most methods only operate on 2D image data. 
Methods to segment volumetric data (3D image data), despite desperately needed, are much less common.
Existing 3D implementations either perform volumetric data segmentation by combining results on 2D slices~\cite{stringer2020}, or, if directly operating on 3D images, tend to require large and expensive GPU hardware (see \eg Table~\ref{tab:results3d}).

Here we present \EmbedSeg\footnote{A memory-efficient open-source implementation of \EmbedSeg is available on GitHub.}, a variation of the inspiring work in~\cite{neven2019}, a very compact model for end-to-end instance segmentation.
Each pixel predicts its own \textit{spatial embedding}, \ie another unique pixel location that is meant to represent the object this pixel is part of. 
Additionally, the network learns an instance-specific clustering band-width, later used to cluster embedding pixels into object instances. 
The segmentation mask of an object is defined by all pixels that point to the same cluster of embedding pixels.
An additional \textit{seediness score} for each pixel is predicted, indicating how likely it is for the respective pixel, and its associated clustering band-width, to represent an object instance. 


We propose several modifications that greatly improve the performance of embedding-based instance segmentations on microscopy data: 
Importantly, \EmbedSeg is not limited to 2D images but can directly be trained and applied on volumetric 3D data.
Instance segmentation results on three 2D and four 3D datasets are presented in Section~\ref{sec:results} and Tables~\ref{tab:results2d} and~\ref{tab:results3d}. 
%
Last but not least, we make all four used 3D datasets and their respective training labels publicly available\footnote{Data download links can be found on GitHub as well.\\ (\url{https://github.com/juglab/EmbedSeg})}.